Nagios alternatives, are there any?

Nagios is one of the standard NMS (Network Monitoring Systems) available to businesses today, it’s wealth of features provide a very flexible system, it’s scalable and customisable so should fit into the most demanding environments – is this all that’s available to assist in keeping an eye on your IT infrastructure 24×7?

No, that’s the simple answer. The longer answer is more complicated and depends on what your System Administrators can support from an application point of view. A lot of the NMS systems available are your standard Perl / C implementation on a flat file or MySQL/PostgreSQL backend so should be supported on most systems. Others however require Java or Python support, whilst this is simple enough to install and run on most Linux distributions, what happens when things go wrong? How many companies have IT staff who have taken the time to learn Java/Python for the next big Web 2.0 site?

I’ve used Nagios for a while now (some 5 years) and have always gone back to the start and looked over the alternatives at various times to see what they offer in terms of features and integration. Integration, that’s the key for me when it comes to choosing an NMS system, if it’s not able to offer a level of integration with the systems we already have (and we’re willing to do some work to make that happen), then it’s a non-starter no matter how laden with features it might be. Over the past couple of years, a wider choice has become available which makes choosing to switch, that much harder! Having the luxury of time to test these newer systems isn’t something that’s available to everyone including me so my ‘experience’ of other systems is limited compared to my time with Nagios, however, I know what I want and need so it doesn’t always take long before commissioning an app to the dusty code graveyard.

Let’s get into some of the more popular NMS systems available at present and what my impression of them has been…..

OpenNMS

OpenNMS is actually a really nice application to use for Network Monitoring, it’s discovery feature works really well and being configurable form XML files makes it extremely easy to setup and maintain. Installation is relatively straight forward if you are using the pre-packaged versions available or are a dab hand with Tomcat. The pre-configured range that I created for Network discovery worked fine and it detected all devices which responded to ICMP and monitoring of individual services/interfaces was simple if not time consuming if you have a large selection of devices to monitor.

The bad points for me are the fact that auto-discovery is the primary way of adding new devices to be monitored. You can add new devices in via the command line on the NMS server, this isn’t too much of an issue depending on how many new devices are added to your infrastructure on a day by day basis. If it’s a sizable amount then this isn’t going to be an option for long and with no way to add single devices in via a web interface your only options left is by some form of integration by way of a script. The next problem is managing the individual services/interfaces that are available for a particular device, this again appears to be a manual process with no easy way to integrate into your current NOC.

PostgreSQL is the supported DB of choice for this project, we haven’t at present migrated over to PostgreSQL which means that the maintenance of this solution would be higher than a MySQL based back end. That’s something to bear in mind in any solution you may migrate to or implement. PostgreSQL is gaining in popularity and features so this at some point will become a mute issue.

Zenoss

Wow! Zenoss appears to have improved quite a bit since the last version that I tested and looks to be highly recommended now. It’s features include a comprehensive API to allow integration with existing systems, this would enable the setup of monitoring new devices quite easy. Installation on platforms with supported binaries appears to be straight forward along with the configuration and setup of your first ‘Devices’ that need to be monitored. Auto-discovery is still an option and is more intelligent than OpenNMS, it provides a handy feature of ‘walking’ your network via routers to find all devices located on your network, this is quite a powerful feature on it’s own.

It supports the ability to expand your single monitoring server to a High Availability solution, whilst this isn’t quite out of the box, it really isn’t a complex setup for a Linux Sysadmin (Setup Guide). This enables you to grow your monitoring environment as your infrastructure grows or provide a level of redundancy to ensure that you know what is going on 24×7.

The changes and improvements that have been made since my last evaluation of Zenoss means that it’s about time that I tested it again – if it became a viable alternative then a lot of work would have to go into the migration from Nagios to Zenoss but it looks like it could be worthwhile.

Zabbix

I’m not a big a fan of the Documentation for Zabbix, everything is dumped into a single PDF which makes it difficult to filter out what is part of configuration and what is part of administration. For instance, to refresh my memory whether you could add a single host into the setup via the web administration, I checked the documentation. Now this was  a brief check, it was 00:15 but could I find anything other than auto-discovery? Nope, not a single thing, the system of course does allow this, you just have to struggle in the docs to find it. Not a great start but not a show stopper if an API was available – it doesn’t appear to be, I can see comments about this on the forum but so far nothing seems to have a materialised so far.

Repeat notifications has now been implemented since I last tested Zabbix, this is something that was extremely lacking in previous versions and is a must for any NOC, especially if your using email/sms/pager alerting. These methods are inherently unreliable when it comes to critical service so sending more than one alert is always a handy feature which meant that before, Zabbix would have sent a single alert when a device went down – and that was it, if you didn’t get that alert for whatever reason then you would be unaware of any issues until someone logged into the administartion system.

Distributed monitoring is included and seems to be extremely simple to setup, this is one of the better features of Zabbix and something worth considering if this is a requirement for your environment. In general Zabbix seems to have improved quite a lot since the last testing I did, the restrictive admin interface means it would be something that I wouldn’t really consider in a live environment.

Finally, a comprehensive list comparing the available NMS applications available is currently hosted on Wikipedia, go and check it out for a list of more NMS applications.

http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems

Share

15 thoughts on “Nagios alternatives, are there any?

  1. I wrote a big paper last summer comparing Nagios, OpenNMS and Zenoss along with a quick look at other things like Cacti and the Dude. Conclusion in September 08 was that Zenoss was just ahead. The paper is available at http://www.skills-1st.co.uk/papers/jane/open_source_mgmt_options.html

    At the time, one of my main criticisms was that the code felt a little unstable; I’ve persevered with it and moved from 2.2 to 2.2.4 and just gone to 2.3.2. Have to say that the stability feels much better now.

    As you comment on Zabbix, the documentation is not good though Zenoss do seem to be gradually addressing this and are doing a major re-write. I have also just completed the first draft of a paper that elaborates greatly on the Zenoss Event Management system, which includes lots of screenshots and examples. Comments on this draft would be much appreciated. The paper is available either from the Zenoss Wiki at http://www.zenoss.com/community/wiki/events-documentation-and-examples/ or from our own website at http://www.skills-1st.co.uk/papers/jane/zenoss_event_management_paper.pdf

    Cheers,
    Jane

  2. Excellent papers there Jane, you’ve put quite a bit more effort into these than I could spare the time to do.

    It does look more and more like Zenoss is the choice for people starting out these days, the size of our Nagios systems means a migration to any other software is going to be a massive job but if I was doing it all over again……

  3. Hi Mark,

    Thanks for popping by. I’m going to do a bit more of an in depth tutorial on all of the ones I’ve setup demos for so will be posting more about Zenoss in the future 🙂

    Neil

  4. Yes, of course there are!
    One of them is Osmius, still improving, but it’s worth it to take a look:
    http://osmius.net

    – Intrussive and non-intrussive monitoring.
    – Own an agent development framework.
    – SLA, Dataware House included.
    – Planned downtimes and time shifts.
    – Subscriptions and on-demand notifications.
    – Simple, easy concepts to understand.

    Next release around February the 6th, 2009 in SourceForge.

  5. Hi Joselu,

    Thanks for the info on your software. At the moment though I can’t test it to give my verdict as I can’t get it installed on a Default CentOS 5 box using the auto installer. If you can get that fixed then I’m willing to evaluate it and also to put a demo up online for people.

    I am attempting to run the software on non-standard ports but I wouldn’t have thought that this would break anything.

    Do you accept this license? [y/n]: y

    —————————————————————————-
    Installation folder

    Please, choose a folder to install Osmius

    Select a folder [/opt/osmius]:

    —————————————————————————-
    MySQL Information

    Please enter your MySQL database information:

    MySQL Server port [3306]: 3308

    —————————————————————————-
    MySQL Credentials

    Please enter your database root user password

    MySQL Server root password :
    Re-enter password :
    —————————————————————————-
    Tomcat Port Configuration

    Please enter the Tomcat configuration parameters you wish to use.

    Tomcat Server Port: [8080]: 8090

    Tomcat Shutdown Port: [8005]: 8006

    Tomcat JMX Port: [1099]: 1100

    —————————————————————————-
    Osmius Server IP

    Main Osmius Server IP [127.0.0.1]:

    —————————————————————————-
    Setup is now ready to begin installing Osmius on your computer.

    Do you want to continue? [Y/n]:

    —————————————————————————-
    Please wait while Setup installs Osmius on your computer.

    Installing
    0% ______________ 50% ______________ 100%
    ########################################
    Error: Error running /opt/osmius/mysql/scripts/myscript.sh /opt/osmius/mysql
    password : /bin/sh: /opt/osmius/mysql/scripts/myscript.sh: cannot execute binary
    file
    Press [Enter] to continue :
    Warning: Problem running post-install step. Installation may not complete
    correctly
    Error running /opt/osmius/mysql/bin/mysql -h localhost -P 3308 -u root
    -ppassword -S ../tmp/mysql.sock < /opt/osmius/osmius/data/osm_create9.01.sql :
    /bin/sh: /opt/osmius/mysql/bin/mysql: cannot execute binary file
    Press [Enter] to continue :
    #

    —————————————————————————-
    Setup has finished installing Osmius on your computer.

    View Readme file? [Y/n]: n

  6. great post!

    I currently use zabbix to monitor a really large environment (more than 320 servers)

    I’ve founded a wonderful plugin that is more than a plugin and the others monitoring systems don’t have nothing of similar, and nothing that go inside oracle so deeply.

    In the hope that someone found useful my comment

    http://www.smartmarmot.com

    here you are going to find Orabbix opensource and released under GPL3

  7. Hi,

    With reference to Osmius error, typically these errors in the installation of Osmius are due to insufficient RAM. We recommend at least 2 GB of RAM.

    Anyway I see by the error that you were using version 9.01 of Osmius, there is a new Osmius release, version 10.10, which also incorporates new features.

    Can you provide us more information about the RAM had you used?

    Thanks in advance.

  8. Shinken, http://www.shinken-monitoring.org, is an alternative to the standard Nagios. Easy to try, even using an existing Nagios configuration. Mostly for people who like Nagios but want more. 😉

    100% Python implementation based on Pyro (distributed framework)
    Nagios NEB Livestatus API compatible so it can be used with most of the Nagios UIs (Multisite, Thruk, Nagvis and Shinken’s own)

    Provides business impact assessment in addition to the typical state evaluation

    Compatible with Nagios UIs, configurations, plugins and has a host of modules (Thrift, NSCA, NRPE, Graphite, PNP4nagios, SQL, etc.)
    100% Open-source and runs on windows + Linux natively.

    Combined with nconf, graphite/carbon/whisper and nagvis it is a solid offering.

    ps. Great post. Learned some about Zabbix.

  9. All three are great alternatives and have their advantages, but you may not need to switch all together. You mentioned you like the documentation from Zabbix and easy deploy/config of OpenNMS. I would keep Nagios around and add BigPanda to the monitoring stack. BigPanda will give the documentation and scalability you are looking for with it’s out of the box integration with Nagios – https://bigpanda.io/integrations/nagios-the-alternative-to-a-flood-of-alerts
    I’d take a look.

Leave a Reply

Your email address will not be published. Required fields are marked *