Another perfect example of open source software gone commercial is Zenoss. As a full-featured network and service monitoring solution, Zenoss is one of the best monitoring tools available.
Most importantly, Zenoss combines two functions. First and foremost an enterprise environment requires host and service monitoring, with notifications. Network monitoring really means checking services, checking that hosts are up (they ping), and possibly writing your own plugins to check various other aspects of a server or network device. Until now, Nagios has filled that role.
Second, once a decent monitoring solution is in place, getting time-based information becomes desirable. Memory and CPU usage is the most prevalent example: if you're checking available swap space every so often with Nagios, you may know when you start running low. But it may be just as important to see a graph of the last week's usage. Tools like Cacti or Munin, which collect data frequently and use RRD graphs to display it, are very useful.
Zenoss fills both roles, without the annoying shortcomings prevalent in the alternative solutions. Zenoss uses the terms Availability Monitoring and Performance Monitoring to describe these two fundamental roles.
Performance of monitoring tools is important, and often times overlooked until it becomes a debilitating problem. For example, if you want to chart pretty RRD graphs of systems statistics like available RAM or disk space, Munin is an option. Unfortunately it's all Perl, and designed in such a way that prevents it from scaling to even moderate amounts of hosts. Cacti is a bit better, but monitoring close to 100 hosts is painful with either option. Along comes Zenoss.
Zenoss is written in Python, and uses a MySQL backend for storage, and by all accounts it appears to perform very well. The really great thing about corporate-backed open source is quality control. The community simply isn't responsible enough to say, "No, this won't work, re-implement it." A company with QA is.
Speaking of features, Zenoss isn't missing many. Flexibility seems to be top priority–it can monitor hosts with SNMP, Nagios agents, SSH, Windows WMI, and various other mechanisms. Many features they claim are a bit over-inflated, such as ZenPing (marketed as Network Topology Monitoring) but the feature set is rich nonetheless.
Zenoss's primary functions involve four features:
- Inventory Tracking
- Availability Monitoring
- Performance Monitoring
- Event Monitoring and Management
Inventory tracking involves some sort of "configuration" reporting as well, but it seems very limited. Zenoss will discover your inventory and auto-populate a database. This is great for knowing which IP addresses are in use, for example, but means that "configuration" reporting is limited to an outside observer's perspective. It can tell you which servers have a Web server running, but it certainly doesn't deal with the configuration of the Web server. Of course, inventory tracking isn't limited to automatically discovered information; there are manual input capabilities too.
Availability monitoring is basically Nagios, plus. It can ping, it can monitor Windows machines, and it can pretty much do whatever you need. Even your old Nagios plugins will work with Zenoss. It does generate reports, but much better ones than Nagios is capable of.
Host monitoring, performance monitoring, or whatever you'd like to call it, is quite robust in Zenoss. Some would think it's light on features, but there's a good reason that Zenoss requires you use SNMP: it's much more scalable than SSH'ing to each server every minute. A bit of up-front configuration is required, in that all your hosts will need SNMP configured and working, but it's completely worth it. Zenoss too uses RRD graphs, and it can generate events and alerts based on pre-defined thresholds.
Finally we come to event monitoring. Zenoss is also encroaching on Splunk's territory a bit. It can combine syslog, availability monitoring alerts, SNMP traps, and even Windows event log data. Much like Splunk, Zenoss correlates similar events for easier viewing and troubleshooting. This is the portion that processes all events and generates alerts to pagers or e-mail, taking into account the escalation procedure you've defined.
To top it all off, the Zenoss Web interface is top-notch. It includes a customizable dashboard for monitoring, and everything is AJAX-driven, providing user experience similar to Splunk and Google's Gmail.
Marketing fluff aside, Zenoss really does provide a wonderful product. It is, of course, open source and available for free.
At last year's LISA conference, Zenoss gave a demonstration that sadly coincided with free beer time. Arriving toward the end, I demanded one of their free baseball caps, and sat to listen to the last few audience questions. One thing was very obvious: everyone in the room was excited about this product. If hardcore sysadmins are excited, you know this is something worthwhile.
Zenosss is very functional and full of features. It may even be possible to replace three separate pieces of software with this one product: host inventory database, Nagios, and your performance monitoring tool of choice. Maybe even Splunk some day. We can't wait to see what features they will be adding next.