Wednesday, January 9, 2013

How to extend Nagios for custom monitoring

The powerful Nagios network monitoring platform lets you supercharge its capabilities with a host of available plugins. If you can't find a plugin to do just what you need, you can easily write your own – here's how.
Nagios plugins can be written in any programming language supported on the platform that's running Nagios. Bash is a popular choice for writing Nagios plugins because it is both powerful and simple.
Every valid Nagios check from a plugin has to produce a numeric exit status. Possible statuses are:
  • 0 – Everything is OK and the check completed successfully.
  • 1 – The resource is in warning state. Something is not quite right.
  • 2 – The resource is in critical state. A host may be down or a service not running.
  • 3 – Unknown state, which does not necessarily indicate a problem, but rather shows that the check cannot give a clear, unambiguous status.
A plugin can also print a text message. By default, this message is shown in the Nagios web interface and in Nagios mail alerts. Even though messages are not a requirement, you usually find them in available plugins because they tell users what is wrong without forcing them to consult documentation.
A simple Nagios plugin written in Bash looks like this. This example plugin checks a specified file:

#assign the first argument ($1) as the filename

#first check if the file exists. this is the first and most basic check with which you should begin
if [ ! -e $filename ]; then
    echo "CRITICAL status - file $filename doesn't exist"
    exit 2 #returns critical status because your worst scenario is that the file doesn't even exist

#if the previous condition passes (file exists) then next check if it is readable    
elif [ ! -r $filename ]; then
    echo "WARNING status - file $filename is not readable."
    exit 1 #returns warning status because this state is better than having no file at all

#if the previous condition passes, check if it is a regular file and not a directory or device file
elif [ ! -f $filename ]; then
    echo "UNKNOWN status - file $filename is not a file."
    exit 3 #returns unknown status
#if all of the above checks pass then it's ok
    echo "OK status - file is OK"
    exit 0 #Return OK status
Comments (which start with # in Bash) explain the code; if you need more clarification or want to learn more about Bash's file test operators, check the documentation.
Even though this example is simple, it is a good illustration of how to implement Nagios plugin logic. Always start by probing for the worst possible scenarios. Only when all checks pass should the script exit with status OK. Make sure to specify the clarifying message before exiting.

Using the plugin

By default, all Nagios plugins are stored in the directory defined in the $USER1 macro, defined in the file /etc/nagios/private/resource.cfg. In a typical Nagios installation from EPEL's repository, $USER1 is defined to /usr/lib/nagios/plugins. The first thing you should do with your plugin is to copy it to the directory defined in $USER1 macro. Usually plugins are owned by root and have permissions of 755; Nagios works under the user nagios, which belongs to the nagios group, so the script requires read and execute permissions for other groups.
Once you place a script in the /usr/lib/nagios/plugins directory you have to define it as a Nagios command within the file /etc/nagios/objects/commands.cfg. Let's say you named your script; add the following command definition:
# our custom file check command
define command{
        command_name    check_file
        command_line    $USER1$/ $ARG1$
That should be pretty clear. The variable $ARG1$ stands for the first argument passed to the Nagios command, which in our case should be the name of the file. If you want to pass more arguments, use $ARG2$ for the second argument, $ARG3$ for the third, and so on.
To start using your plugin, define it as a service in your nagios configuration(services.cfg for example):
define service{
        use                             local-service
        host_name                       localhost
        service_description             Check the file /etc/passwd
        check_command                   check_file!/etc/passwd
The above service is defined for localhost (host_name localhost) and uses the template (see the documentation on object inheritance for templates and how they work) for local-service (use local-service). The most important part is the check_command directive. It specifies the command check_file, followed by an exclamation point as a separator, followed by a file name as argument. If your plugin has more than one argument you can separate them with additional exclamation points.

Running Nagios plugins remotely

One obvious flaw of the example check_file plugin is that it works locally, which means you cannot check a file on a remote server. You can resolve this problem in a number of ways.
The first approach would be to use the ssh command to execute the script remotely. This requires you to copy the script to a remote server and make use of ssh's ability to run remote commands. It also requires you to set up passwordless key login for the Nagios server and its nagios user. If you are not sure how to do this, check this article for all the details.
The benefit of this first approach is that you have all the power and flexibility of running commands locally for the monitored server. The drawback is that the Nagios server has to be able to log in passwordless with a key to the remote server. This is a security issue and not recommended for sensitive environments.
A second and more secure approach is to use the SNMP extend feature. This requires that you have the net-snmp package (for CentOS) installed and configured on the remote server.
To use the SNMP extend command, first copy the script to the remote server. You can place it in the directory /usr/bin/, for example.
Next, add the configuration directive extend check_passwd_file /usr/bin/ /etc/passwd to the file /etc/snmp/snmpd.conf on the remote server. The syntax is extend some_alias command argument. Here comes the main inconvenience of this method – you have to define an alias for each separate check, which in our case means an alias for each separate file we want to test, because you cannot send arguments over SNMP.
Any changes in the file /etc/snmp/snmpd.conf require you to reload the snmpd service with the command service snmpd reload (for CentOS). After that you can test the new check with the snmpget command, as in snmpget -v2c -c public -OvQ NET-SNMP-EXTEND-MIB::nsExtendOutputFull.\"check_passwd_file\". This example snmpget command queries the server over SNMP version 2c with the "public" community string. The object identifier (OID) for your custom SNMP extended commands is NET-SNMP-EXTEND-MIB::nsExtendOutputFull.\"some_alias\".
Unfortunately the above command cannot be implemented directly in Nagios. If snmpget works properly and can connect to the remote host, it will always return status 0, indicating everything is OK, because the program snmpget itself exits without an error. Thus even if a file doesn't exist, the check script will return status 0, though it will print the correct message that the file is not there.
You can address this problem by taking advantage of a special plugin for Nagios called This plugin takes the first word from a status message and sets the status according to it. It was in anticipation of using this plugin that we set the messages in our example script to start with OK, CRITICAL, WARNING, and UNKNOWN.
To start using the plugin, first download it, then place it in the directory /usr/lib/nagios/plugins ($USER1 macro) on the Nagios server. On CentOS you have to edit the script and replace /usr/local/nagios/libexec/ with /usr/lib/nagios/plugins/, which is the correct path for the script.
After that you can use just like any other plugin. First, define it as a command:
define command{
 command_name check_snmp_extend
 command_line $USER1$/ $HOSTADDRESS$ $ARG1$
After that define a service:
define service{
 use                 generic-service
 service_description Check For /etc/passwd
 check_command  check_snmp_extend!check_passwd_file
Using SNMP's extend option is as secure as your SNMP configuration is. This approach requires minimal modification on remote hosts and ensures a standard setup conforming to best security practices. You can find other Nagios plugins for similar purposes, such as nrpe, but they require the remote installation of additional services, which is not always a good idea from a security and compatibility point of view.
As you can see, it's easy to extend Nagios with custom-written plugins. The fact that Nagios allows such extension is one of the reasons many administrators prefer it over other monitoring solutions.

No comments:

Post a Comment