Last week we provided a brief overview of Nagios, and explained how it can make your infrastructure monitoring fun and easy. As promised, we’d like to talk a bit more about the configuration files, because that’s the biggest hurdle to setting up a lean, mean monitoring machine with Nagios. Nagios does more than just monitor services, so after some configuration clarifications we'll shift focus to explore the more creative uses of Nagios.
Get to Know Your Configs
First, it should be noted that Nagat is a Web-based configuration aid for Nagios. It presents a form for you to fill in, and then generates configuration files automagically. Second, make note that you can compile Nagios with Postgres or MySQL support, so the logging ends up in a database for easier parsing by other applications. Now that you’re at configuration stage, it’s time to take a look at those configuration files in more detail.
The first file of interest is nagios.cfg. This is the main configuration file, and both the Nagios program and Web CGIs use it. Inside, you’ll find numerous examples and comments. A frequent point of confusion from new Nagios users seems to be the definition of a “macro.” This is simply a variable, of the strange form $VAR$=VALUE. Macros are used in resource files, which are normally used to specify values that you don’t wish everyone on the Internet to have access to. Passwords are a good candidate for resource files. To define a resource file, put "resource_file=/path/to/it" in the nagios.cfg file.
Of significant importance are the configuration options related to check scheduling and process management. Nagios will schedule the checks intelligently to avoid forking hundreds of process every minute. Nagios will schedule a check for everything when it starts, but checks will be interleaved to minimize server and client loads. You can minimize the interleaving by adjusting the service_interleave_factor option, if you want to be certain that specifics checks happen at the exact interval you’ve scheduled them. You can also turn off “smart” scheduling, but if you’re monitoring a large number of services on many hosts, you’ll be sorry.
Next, come the object definitions (or host definitions). This is where you define your hosts and services, as well as groups, contact policies, and many other attributes. A host definition defines the parameters that can be applied to a monitored host, and the service definitions define the attributes of a service. The term “service” in Nagios really means “any monitored attribute,” which includes disk space, logged in users, and of course the traditional notion of services (HTTP, POP, etc) . There are tons of options, and you really need to configure the alerts to suit your needs. The new method for configuring Nagios is outlined in its documentation under the section about defining object data: using the template-based method. This form of configuration allows far simpler definitions, so anyone just starting out with Nagios should really use that style.
Finally, the CGI configuration file is named cgi.cfg. The CGI configuration defines user access controls, including accounts, passwords and levels of access. These options are well documented, too.
The real key to Nagios is that it's simply a framework. This makes it very easy to produce your own scripts (plugins), or download and use the various ones found throughout the Internet.
Cool Uses for Nagios
Nagios, again, can run anything you ask it to. This includes, for example, scripts that use snmpwalk to grab statistics from SNMP-enabled devices. A really neat use of Nagios is for network security purposes. You can configure a check that polls all of your routers and grabs their ARP table to run every few minutes. If you know roughly how many hosts you have, the number of entries shouldn’t surprise you. When this number increases significantly, it could indicate some malicious activity is taking place on the network.
Security monitoring is a big deal. We don’t recommend that you trust Nagios with the task of host security, but it can be useful for paging you when certain conditions are found true. Ideally, you’d want to write a script that checks a samhain database and notifies you if anything severe happens to be amiss. A useful plugin for secondary host monitoring, which runs chkrootkit, does exist. Chkrootkit can perform some pretty advanced checks, and it stays up to date with the latest tricks that malware employ. Chkrootkit is very resource intensive, but nonetheless useful to run once per day, along side your usual file integrity monitoring solution.
Temperature monitoring cannot be left out. There exists some very fancy, a.k.a. expensive, environment monitoring solutions for your data center. Nagios to the rescue. Plugins are available for Nagios that support many stand-alone temperature probes. A popular probe, TempTrax, is a very inexpensive solution for temperature monitoring.
A Nagios writeup wouldn’t be complete without mentioning its reporting mechanisms. Nagios produces trend reports that are very accurate. For example, if Nagios was not running, or if logs from a certain time period mysteriously go missing, these times get reported as “undetermined.” Many people use Nagios to report on SLA compliance to their clients.
Last but not least, we feel that the community deserves mention as well. One contributed add-on in particular, APAN, is extremely valuable. The Advanced Performance Addon for Nagios provides a Nagios-integrated Web page that displays RRD graphs. Now you can view your RRD graphs of network usage, load, etc all in one interface. There are also more generic ones, Like NagiosGrapher, that will generate graphs of all your Nagios data.
The graphing capabilities above promote Nagios to the level of "capable of providing a complete monitoring solution." Both the ease of configuration and the flexibility inherent in the Nagios design have made Nagios synonymous with data center monitoring. So what are you waiting for?