Speeding up Nagios I. | IT admin blog

If we are taking the networking seriously, we have to monitor its status and health of its services. In small networks with number of devices in tens is the whole issue quite simple and the default configuration is sufficient. But what if the network has “grown a little bit” to several hundreds of devices and several thousand of services and Nagios starts running out of breath?

I have been exposed to this problem for quite a long time. I was postponing solving this problem hoping that “it is still working, just a update or upgrade will be needed”. Until today, when I managed to look at the problem more closely.

Our corporate Nagios is monitoring network consisting of dozens of routers, switches, radios and other jewelery spreading on the area of three-four districts. Of course, with so many devices it has a lot of work to do to run across the whole network and check it. The problem was, that this load was affecting latency and jitter during basic work with server through SSH and also distorted some measurings of latency, resulting in false alarms.

What was slowing the Nagios down? I/O operations. Using the iotop utility from the package with identical name I just made sure, that the higher I/O load was generated by Nagios. By using some well-directed questions to Google I’ve found a solution. Move the spool directory to ramdisk. And it helped.

Let’s see how to do this. First, create mountpoint, e.g. /var/ramdrive.

mkdir /var/ramdrive

Define system in /etc/fstab.

tmpfs /var/ramdrive tmpfs size=128M,mode=0755,uid=1001,gid=1001 0 0

Ramdisk size of 128MB will be big enough to hold all the necessary data. Replace uid=1001,gid=1001 with UID and GID of yours installation’s nagios user. Mount the ramdisk and create basic file structure.

mount /var/ramdrive
mkdir -p /var/ramdrive/spool/checkresults

Now change parameters in nagios.cfg.

object_cache_file=/var/ramdrive/objects.cache
status_file=/var/ramdrive/status.dat
check_result_path=/var/ramdrive/spool/checkresults

If we are using performance data collection and visualization using PNP4Nagios, we can accordingly set up saving this data to ramdisk, too. After restart using /etc/init.d/nagios restart will our system be faster because of all time-consuming and overhead operations will take place in ramdisk.

I admit that it is only a short term solution to the time when this setup also hit its limits and distributed monitoring is simply inevitable, but yet it is enough 🙂

Andy says:

Tuesday August 27th, 2013 at 08:17 PM

Ahoj,

Celkom fajn riesenie. Popravde neviem ci to sposobuje problem, ale mavam pomerne casto problem akokeby vypadky pingov… Uvidime ci to s tymto nesuvisi.

PS: Oprav v navode cesty… Tam kde mat priklad co treba upravit v nagios.cfg mat cesty “/var/ramdrive” a inde mas “/mnt/ramdisk” – tak bud tam chyba symlink, alebo len opravit cesty…

backslash says:

Wednesday August 28th, 2013 at 08:33 AM

Ďakujem za komentár. Popravde, článok som písal takpovediac za jazdy, preto tie odlišné mountpointy, už som to opravil.
Čo sa týka výpadku pingov, niečo podobné som tiež riešil, v krátkom čase o tom napíšem blogpost.

Zrýchľujeme Nagios II. | IT admin blog says:

Thursday November 7th, 2013 at 09:21 PM

[…] minulom článku z tejto minisérie sme si ukázali, ako dramaticky zrýchliť Nagios umiestnením spool adresára […]

Leave a Reply Cancel reply