Nagios is one of the standard NMS (Network Monitoring Systems) available to businesses today, it’s wealth of features provide a very flexible system, it’s scalable and customisable so should fit into the most demanding environments – is this all that’s available to assist in keeping an eye on your IT infrastructure 24×7?
No, that’s the simple answer. The longer answer is more complicated and depends on what your System Administrators can support from an application point of view. A lot of the NMS systems available are your standard Perl / C implementation on a flat file or MySQL/PostgreSQL backend so should be supported on most systems. Others however require Java or Python support, whilst this is simple enough to install and run on most Linux distributions, what happens when things go wrong? How many companies have IT staff who have taken the time to learn Java/Python for the next big Web 2.0 site?
I’ve used Nagios for a while now (some 5 years) and have always gone back to the start and looked over the alternatives at various times to see what they offer in terms of features and integration. Integration, that’s the key for me when it comes to choosing an NMS system, if it’s not able to offer a level of integration with the systems we already have (and we’re willing to do some work to make that happen), then it’s a non-starter no matter how laden with features it might be. Over the past couple of years, a wider choice has become available which makes choosing to switch, that much harder! Having the luxury of time to test these newer systems isn’t something that’s available to everyone including me so my ‘experience’ of other systems is limited compared to my time with Nagios, however, I know what I want and need so it doesn’t always take long before commissioning an app to the dusty code graveyard.
Let’s get into some of the more popular NMS systems available at present and what my impression of them has been…..
OpenNMS is actually a really nice application to use for Network Monitoring, it’s discovery feature works really well and being configurable form XML files makes it extremely easy to setup and maintain. Installation is relatively straight forward if you are using the pre-packaged versions available or are a dab hand with Tomcat. The pre-configured range that I created for Network discovery worked fine and it detected all devices which responded to ICMP and monitoring of individual services/interfaces was simple if not time consuming if you have a large selection of devices to monitor.
The bad points for me are the fact that auto-discovery is the primary way of adding new devices to be monitored. You can add new devices in via the command line on the NMS server, this isn’t too much of an issue depending on how many new devices are added to your infrastructure on a day by day basis. If it’s a sizable amount then this isn’t going to be an option for long and with no way to add single devices in via a web interface your only options left is by some form of integration by way of a script. The next problem is managing the individual services/interfaces that are available for a particular device, this again appears to be a manual process with no easy way to integrate into your current NOC.
PostgreSQL is the supported DB of choice for this project, we haven’t at present migrated over to PostgreSQL which means that the maintenance of this solution would be higher than a MySQL based back end. That’s something to bear in mind in any solution you may migrate to or implement. PostgreSQL is gaining in popularity and features so this at some point will become a mute issue.
Wow! Zenoss appears to have improved quite a bit since the last version that I tested and looks to be highly recommended now. It’s features include a comprehensive API to allow integration with existing systems, this would enable the setup of monitoring new devices quite easy. Installation on platforms with supported binaries appears to be straight forward along with the configuration and setup of your first ‘Devices’ that need to be monitored. Auto-discovery is still an option and is more intelligent than OpenNMS, it provides a handy feature of ‘walking’ your network via routers to find all devices located on your network, this is quite a powerful feature on it’s own.
It supports the ability to expand your single monitoring server to a High Availability solution, whilst this isn’t quite out of the box, it really isn’t a complex setup for a Linux Sysadmin (Setup Guide). This enables you to grow your monitoring environment as your infrastructure grows or provide a level of redundancy to ensure that you know what is going on 24×7.
The changes and improvements that have been made since my last evaluation of Zenoss means that it’s about time that I tested it again – if it became a viable alternative then a lot of work would have to go into the migration from Nagios to Zenoss but it looks like it could be worthwhile.
I’m not a big a fan of the Documentation for Zabbix, everything is dumped into a single PDF which makes it difficult to filter out what is part of configuration and what is part of administration. For instance, to refresh my memory whether you could add a single host into the setup via the web administration, I checked the documentation. Now this was a brief check, it was 00:15 but could I find anything other than auto-discovery? Nope, not a single thing, the system of course does allow this, you just have to struggle in the docs to find it. Not a great start but not a show stopper if an API was available – it doesn’t appear to be, I can see comments about this on the forum but so far nothing seems to have a materialised so far.
Repeat notifications has now been implemented since I last tested Zabbix, this is something that was extremely lacking in previous versions and is a must for any NOC, especially if your using email/sms/pager alerting. These methods are inherently unreliable when it comes to critical service so sending more than one alert is always a handy feature which meant that before, Zabbix would have sent a single alert when a device went down – and that was it, if you didn’t get that alert for whatever reason then you would be unaware of any issues until someone logged into the administartion system.
Distributed monitoring is included and seems to be extremely simple to setup, this is one of the better features of Zabbix and something worth considering if this is a requirement for your environment. In general Zabbix seems to have improved quite a lot since the last testing I did, the restrictive admin interface means it would be something that I wouldn’t really consider in a live environment.
Finally, a comprehensive list comparing the available NMS applications available is currently hosted on Wikipedia, go and check it out for a list of more NMS applications.