Home > Hobbit / Xymon, Server Monitoring > Monitoring Sun Grid Engine with Xymon

Monitoring Sun Grid Engine with Xymon

After a recent job switch I’ve had the opportunity to setup Xymon from scratch and start developing even more scripts for new pieces of software and work flows. One of my first tasks was to setup a new “cluster” using all the software I felt most comfortable with as a show case in order to determine if my preferred tools worked as well or better than the currently used ones – or not. After a week or so of setting up a full Cobbler installation, Xymon and my own Glovebox, I presented it to my new employers with positive responses. After all that work I wanted to make sure that the people using the system were as happy as possible with the monitoring needs – one thing was mentioned more than others, which was the ability to easily see the status of Sun’s Grid Engine running on the cluster. I immediately set to work and came up with a quick solution for them using Xymon and a script that parsed the output of ‘qstat -f’ As with my Xen monitoring script, it runs in one place and sends in data for all the associated machines. Meaning for each execution node you’ll have a column with just its information, and a combined column for the qmaster. The output is as follows:

Total Slots: 2
Total Used Slots: 2

Queue: MainQueue
Slots: 2
Reserved Slots: 0
Used Slots: 2

The master is the same, but contains information for the entire cluster. When all slots are taken, the test icon/color turns to ‘green’, when some are available the test icon/color changes to ‘clear’ I’ve also created an easy to understand graph which consists of a green area that is the total number of slots, which a red area over it showing the used slots. As slots free up the green shows itself noting there are slots available.

The Script is available for download here.

The Graph definitions are:

[sge]
TITLE SGE Used Slots
YAXIS # Used Slots
DEF:SlotsUsed=sge.rrd:TotalUsedSlots:AVERAGE
DEF:SlotsTotal=sge.rrd:TotalSlots:AVERAGE
AREA:SlotsTotal#00CC00:Total Slots
AREA:SlotsUsed#FF0000:Used Slots
COMMENT:\n
GPRINT:SlotsUsed:LAST:Used Slots  : %5.1lf (cur)
GPRINT:SlotsUsed:MAX: : %5.1lf (max)
GPRINT:SlotsUsed:MIN: : %5.1lf (min)
GPRINT:SlotsUsed:AVERAGE: : %5.1lf (avg)\n
GPRINT:SlotsTotal:LAST:Total Slots   : %5.1lf (cur)
GPRINT:SlotsTotal:MAX: : %5.1lf (max)
GPRINT:SlotsTotal:MIN: : %5.1lf (min)
GPRINT:SlotsTotal:AVERAGE: : %5.1lf (avg)\n

Update: I’ve modified my script to NOT change colors.  Seems xymon really ignores whatever you sent in since it really thinks you have no data when you turn to ‘clear’ – it was breaking my graphs.

  1. No comments yet.
  1. No trackbacks yet.