Saturday, February 03, 2007

Monit

Getting a little tired of this software. Monit is supposed to monitor services, let you know if they go down, and restart them if possible. In practice, monit goes down by itself (usually under high load), fails to notice that processes have recovered, and doesn't play well with zope.

It doesn't play well with zope because it does a restart like this: First it calls the stop program you specified (usually /etc/init.d/servicename stop), then it calls the start program you specified, and then it immediately checks if the service is up. In the case of zope, it takes a while to start up and bind to a port on the network, so this check fails. Monit then assumes that execution failed and retries.

There is also the insane dependency handling. It starts with squid. Zope can behave like an upstream cache, that is, squid can talk ICP to it. Squid doesn't always notice when a backend zope returns from the dead, so you need to reload the config to make squid notice. I cponfigured monit with a service called squidreverse, a custom check that makes sure that the backend zopes are reachable via squid. Since the check will only succeed if the backend zopes are actually alive, I made squid depend on the zopes. I then found that if a zope process goes down, monit will stop squid, then restart zope, and finally restart squid again. Not what I wanted.

I also made a plugin for nagios that can check monit instances. Unfortunately the latest version of monit checks itself, and sees a restart as a problem, reporting a false error to nagios.

I'm therefore going back to nagios for monitoring services. It may be a bitch to set up, but I love the control.

0 Comments:

Post a Comment

<< Home