How-To: Fix service check time outs in Nagios + NRPE deployed in CentOS/RHEL 5

Once you get used to writing plug-ins in Nagios and the complexity of the plug-ins you write grows, you may encounter this error, service check timed out.

If some of your service checks have this problem, you can isolate the problem in these 3 values:

1. how slow is the plugin

    This is the first thing you should do. Check if how much time does your plugin needs before it can finish checking and provide an exit status. Log-on to the server you’re monitoring and run the plugin locally. Use the time command to measure.
    $ time /usr/lib/nagios/plugins/check_service

2. how short is NRPE’s patience

    Once you have the value (in seconds) in step #1, check your NRPE configuration in that same server . The default location of NRPE’s configuration is /etc/nagios/nrpe.cfg
    Find this parameter, command_timeout. The value of this parameter, in seconds, must be greater than the value that you’ve got in step #1.
    Once the parameter’s been set, restart the NRPE service (service nrpe restart).

3. how short is Nagios’ patience

    Nagios executes a command, check_nrpe, to connect to a NRPE service. check_nrpe has a timeout paramer, -t. This parameter must have a bigger value than the one you set in step #2.
    Log-on to your Nagios server and you can set this by opening the commands configuration file, /etc/nagios/objects/commands.cfg
    Find check_nrpe, and edit its command_line and set the -t parameter. For instance, if you want the timeout value to be 500 seconds, it will look like this:
    command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 500
    Restart the Nagios service afterwards (service nagios restart).

In most cases these 3 steps should do🙂

2 thoughts on “How-To: Fix service check time outs in Nagios + NRPE deployed in CentOS/RHEL 5

  1. Robert Thille

    Is there a way to have a per-command command_timeout? That is, we have one check which is slow, but I don’t want to up the timeouts for all the commands for that one since it may hide issues with the other commands suddenly getting slow.

    Reply
  2. Mike

    Excellent information. I would like to add one comment. When you receive the error, “Service check timed out after 60.02 seconds”. Compare the value in the error to the existing setting value. If they are different then the setting you are reviewing is most likely not the one you need to change. For me after applying all of the good information above I still had the issue.

    On the Nagios server, under /usr/local/nagios/etc/nagios.cfg file is another timeout setting. This one matched the rror value. Once I increased it the problem went away.
    service_check_timeout=60

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s