As any other monitoring system Nagios can produce false alarms. Usually it happens when Nagios fails to get the reply from the host being monitored during some pre-defined timeout. In order to mark service as down Nagios does three checks and if all of them are failed then the service is marked down and administrator will got an alert about its critical status. At the same time even if one of those checks fails Nagios will report administrator about it depending on configuration (e-mail, twitter, chat message, SMS etc.).

If you face some false alarms occasionally but the service is actually online then it makes sense to increase timeout value from default 10 seconds to, let’s say, 20 seconds. Moreover, if you have phone call alarms configured with nagios then this slight change may help to make your sleep better.

Open one of nagios’ configs where check commands are defined (usually it’s /etc/nagios/commands.cfg file) and find there a block named check_nrpe, add “-t 20″ to the end of its command_line so it will look like below:

define command {
    command_name    check_nrpe
    command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 20
}

And restart Nagios.

Besides check_nrpe there are also other commands like check_http, check_smtp and others: all of them supports -t options so just modify them like check_nrpe depending on your Nagios timeout conditions.

 

6 Comments

 

  1. March 16, 2012  9:59 am by Kenneth Woodell Reply

    I came to your website a few days ago and I have been reading through it regularly. You have a ton of very good information on your blog and i also really like the particular design of the site as well. Keep up the great work!

  2. March 16, 2012  11:47 am by artiomix Reply

    Thanks, Kenneth. Don't forget to subscribe to RSS feed, I plan to revive this blog after pretty long pause.

  3. March 16, 2012  12:43 pm by Guillermo Garron Reply

    Hi,
    Nice to see you back Artem, I really hope to see this site live again.

  4. March 17, 2012  1:00 pm by Ivory Cunico Reply

    Yes, because you can't change that wallpaper EVER.

  5. March 18, 2012  11:19 am by Lezlie Barocio Reply

    Grad did set the bar for HealthVault and I am glad he did. It is possible to have a fail-safe system.

  6. March 30, 2012  7:34 pm by fruit mocking party Reply

    Greetings! This is my first visit to your blog! We are a collection of volunteers and starting a new initiative in a community in the same niche. Your blog provided us valuable information to work on. You have done a outstanding job!

Leave a reply

 

Your email address will not be published.