Monitor services and restart them if they die

Q I am running a number of services on my server - is there a way that I can monitor these services, and restart them if they die? I wondered about using some sort of Cron task.

A There are a number of programs written specifically for this task - the most popular of which is probably Mon, which you can get from www.kernel.org/software/mon. There is quite a long list of dependencies, mainly Perl modules, so it would be most convenient to install it with your distro's package manager. Mon can be installed on the computer that you wish to monitor or on any other computer that can reach it over the network. The latter is a better choice, as it will be able to let you know if the server dies altogether. Mon is controlled by a config file located in /etc/mon. Here's an example section that monitors a web server:

hostgroup servers www.example.com
watch servers
service http
interval 5m
monitor http.monitor
period wd {Sun-Sat}
alertevery 1h
alert mail.alert webmaster@example.com

This will attempt to connect to the web server every five minutes and email an alert if it fails. The alertevery parameter means that although it will continue to check every five minutes, it will not send a mail on every consecutive failure, only nag you every hour. Mon is able to monitor more than services: it can also keep track of things like disk space and processes, which could help you prevent a rogue program or denial of service attack stopping the server completely. There are other alert options supplied with Mon, including pager alerts (after all, there's no point in an email alert if the mail server has just died).

Monitors and alerts are Perl scripts, so you can customise them or build your own - the Mon website has a collection of user-contributed monitors and alerts, you can be nagged by AIM or text message if you really want. Another program worth considering is Monit -www.tildeslash.com/monit. This works in a similar way to Mon, but is designed to run on the server itself and be able to take corrective action rather than disturb the sysadmin. Monit is able to restart a service that has died - it also has a built-in web server that enables you to log in from a remote computer to check on the status of monitored services. The safest approach is to run Mon remotely and Monit locally.

Back to the list