sysusage - Man Page
System Monitoring Tool
Description
SysUsage is a tool used to continuously monitor a system and generate daily/weekly/monthly/yearly graphical report using rrdtool and sar.
Features
SysUsage generate graphical reports on all system activity information. His periodical reports allow you to keep track of the machine activity during his life and will be a great help for performance analysis and resources management.
SysUsage can be run periodically from 10 seconds cycle in daemon mode to 1 minute or more using crond.
SysUsage can be run from a central server to call a ssh remote execution of the sysusage perl script so that collected data will be stored in this central place. You also will have just one place where rrdtool and related Perl modules need to be installed as well as just one place where sysusagegraph or sysusagejqgraph need to be executed.
CPUs
- CPUs distribution usage (user, nice, system). - CPUs global usage (total cpu used, iowait). - CPUs virtualized usage (steal, guest).
Memory
- Memory usage (with and without cache). - Swap usage (with and without cache). - Amount of memory need for current workload. - Posix share memory. - Hugepages utilisation - Active versus inactive memory - Dirty memeory that need to be written to disk
I/O
- Context switches per second. - Interrupts per second. - Page swapping. - Page I/O stats. - I/O request stats. - I/O block stats.
Network
- TCP connections per second. - TCP segments per second. - Number of socket in use (Total, TCP and UDP). - Number of socket in TIME_WAIT state. - Active network interface usage. - Active network interface bad packet, dropping, collision.
Devices
- CPU time for I/O on device. - Read/Write sectors on device. - Disk throughput on device. - I/O workload on device. - Times for I/O requests issued to device. - Hard drive temperature if your hardward support it (with hddtemp). - MotherBoard/CPU/Remote temperature reported by sensors or sar. - Fan RPM reported by sensors.
Files
- Number of open file. - Number of file in a queue directory. - Disk space used on mounted partition.
Process
- Load average. - Process created per second. - Number of running process (ex: sendmail, httpd, oracle, etc.). - Number of running thread (ex: mysqld, amarok, etc.). - Number of task blocked waiting for I/O
Notification
You can have mail or Nagios notification when some monitored values are outside max/min threshold values for all type of monitoring.
Plugins
With SysUsage you can create your own monitoring plugins. Any script or program can be embeded in SysUsage provided that it return up to 3 numeric values. The graphic title and labels are defined in the configuration file.
Remote call
SysUsage can be installed and run onto a central server that will be used to store statistics data by periodically calling sysusage on remote host using SSH. This central place will also be in charge to renderer HTML plages and graphics for all hosts. This will allow to simplify the SysUsage installation on remote host that will only require sysstat and rsysusage.
Requirement
rrdtool
You need to install rrdtool. All distribution may have a dedicated package for rrdtool. On CentOs/RedHat distributions, use the following command:
yum install rrdtool rrdtool-perl
on Debian/Ubuntu distributions use command:
apt-get install rrdtool librrds-perl
The sources can be found here:
http://people.ee.ethz.ch/~oetiker/
If you compile from sources and want to use the RRDs perl module embedded with it, you must use the following command to compile:
make site-perl-install
This installation is optional if sysusage is installed on a remote host.
sysstat
You also need sar to collect statistics. Sar is part of the sysstat package. For RPM like distributions:
yum install sysstat
and Debian like distributions:
apt-get install sysstat
The sources can always be found here :
http://freshmeat.net/projects/sysstat/
If you plan to use threshold notification you must have Net::SMTP installed.
yum install perl-Net-SMTP-SSL
or
apt-get install libnet-smtp-ssl-perl
Sources can be found on CPAN (https://metacpan.org/pod/Net::SMTP)
Perl modules
Sysusage can be run in a central place to collect remote sysusage statistics using ssh. The remote calls are proceed simultaneously using fork with the Proc::Queue Perl module.
If you're plan tu use sysusagegraph instead of sysusagejqgrpah you will also need the GD and GD::Graph3D Perl modules. Note that the use of GD and GD::Graph is deprecated and sysusagegraph will be removed in next major release (6.0).
All these modules are always available from CPAN (https://metacpan.org/) and may at least be installed on the central server. On remote host this is optional and depend if you want to run it on each server or by ssh from a central place.
Nagios nsca client (optional)
If you want to send message to Nagios you need to install nsca-2.7.2.tar.gz or a more recent version. You can get it here:
http://sourceforge.net/projects/nagios/files/
hddtemp and sensors (optional)
If you want to monitor your hard drive temperature you must install a small utility called hddtemp. You can download it from http://download.savannah.gnu.org/releases/hddtemp/. Run it to see if your hard drive have a temperature sensor.
You can also use sensors to monitor your cpu temperature and fan speed. If you harware support it run sensors-detect and load the required kernel modules at boot time.
Installation
Quick install
Simply run the following commands:
perl Makefile.PL make && make install
By default it will copy the perl programs into /usr/local/sysusage/bin and the HTML output will be done to /var/www/htdocs/sysusage/. The configuration file is /usr/local/sysusage/etc/sysusage.cfg and all RRD Bekerley DB databases from rrdtool will be saved under /usr/local/sysusage/rrdfiles.
If you plan to run sysusage on different servers from a central place you may just want to install the rsysusage Perl script on remote hosts. So proceed as follow:
perl Makefile.PL REMOTE=1 make && make install
It will copy the only the rsysusage into /usr/local/sysusage/bin and the configuration file under /usr/local/sysusage/etc/sysusage.cfg. The RRD data directory will be created under /usr/local/sysusage/rrdfiles but just to hold the *.cnt files relatives to the count of alert attempt on threshold exceed.
Custom install
You can overwrite all install path with the following Makefile.PL arguments. Here are the default values:
BINDIR=/usr/local/sysusage/bin CONFDIR=/usr/local/sysusage/etc PIDDIR=/usr/local/sysusage/etc BASEDIR=/usr/local/sysusage/rrdfiles PLUGINDIR=/usr/local/sysusage/plugins HTMLDIR=/var/www/htdocs/sysusage MANDIR=/usr/local/sysusage/doc DOCDIR=/usr/local/sysusage/doc REMOTE=
For example on a RedHat System you may prefer install SysUsage as this:
perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \ BASEDIR=/var/lib/sysusage HTMLDIR=/var/www/html/sysusage \ MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage
If you are installing sysusage on a host that will be call by ssh from a central place, you may want to install just what is necessary and not more:
perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \ MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage \ REMOTE=1
This will just install the rsysusage Perl script, the configuration file and documentation. So that you don't need to install extra Perl modules and other graphics related things.
Package/binary install
In directory packaging/ you will find all scripts to build RPM, slackBuild and debian package. See README in this directory to know how to build these packages.
Usage
SysUsage consist in two main Perl scripts, sysusage and sysusagegraph. Once you have correctly installed and configured SysUsage the best way to execute them is by setting a cron job. If you prefer javascript graphics instead of GD::Graph images use sysusagejqgraph that is based on jqplot javascript library. This is the recommanded script as use of GD::Graph through sysusagegraph is deprecated.
sysusage
The script sysusage is responsible of collecting system informations at a given interval and store them into rrdtool database files.
As it is very fast you can set running interval time to 1 minute. This is the default pooling interval used in configuration and graph reports. If you change this interval you must also change it in the configuration file otherwise your graph will be false. See the INTERVAL configuration directive.
Here is how I use it with a default installation:
*/1 * * * * /usr/local/sysusage/bin/sysusage > /dev/null 2>&1
rsysusage
This script do the same things as the sysusage Perl script but instead of storing collected datas on file it will dump them to the standard output. This script is used instead of the sysusage Perl script by a ssh call from a central server where the local sysusage will store the statistics retrieved from multiple servers.
/usr/local/sysusage/bin/rsysusage -r remote_hostname
Where 'remote_hostname' is the hostname given in the [REMOTE ...] configuration section.
sysusagegraph (deprecated) / sysusagejqgraph
The perl script sysusagegraph is used to draw PNG graphs and write HTML file. As he knows the pooling interval given in the configuration file it can be run at any time. I used to run it each five minutes but you can run it each hours or more this is the same.
*/5 * * * * /usr/local/sysusage/bin/sysusagegraph > /dev/null 2>&1
Since release v4.0 of SysUsage there's a JQuery plotting replacement of rrdGraph that only write HTML files with all javascript code to allow the client browser to draw the graphs. To enable this feature you just have to use sysusagejqgrpah instead.
*/5 * * * * /usr/local/sysusage/bin/sysusagejqgraph > /dev/null 2>&1
There's some more resources javascript libraries and CSS files to install. The SysUsage installer will do the job for you. This remove the requirement of the GD, GD::Graph and GD::Graph3D Perl modules.
sysusage.cfg
If you have change the default installation path (/usr/local/sysusage) you may need to give these scripts the path to the configuration file as command line argument using -c option. To know what arguments can be passed use option -h or --help.
Note that since version 3.0 the default configuration path in these scripts is set during installation. So you may not need anymore to edit these scripts or give the path of the configuration file as command line argument.
See Configuration chapter for more information on howto configure your system monitoring.
Daemon mode
Crond is good for scheduling but not under the minute. If you want to monitor your system within an interval under the minute you may want to run sysusage in daemon mode. To do that, just change the INTERVAL to the desired timer in the configuration file and the DAEMON directive to 1.
Debug mode
Some time things don't appear as you wanted. The best way to see what's going wrong is to run sysusage in debug mode. This mode allow you to see all values extracted from sar and other tools. Use the --debug option for that, this mode prevent sysusage to store data in the rrdfiles. Command:
/usr/local/sysusage/bin/sysusage --debug
Please, run this command and check the result before sending bug report.
Output
Once sysusage and sysusagegraph are running since some cycles, run your favorite browser and take a look at the output directory. By default:
http://my.server.dom/sysusage/
If you have special URI and/or port remember to modify the URL configuration directive without that the web interface will not works.
Configuration
During installation a default configuration file sysusage.cfg is generated. The default settings are good enougth to report essential information of your system, but if you want to monitor some processes, queue directories or some devices you must edit this file by hand.
Here is the format of the configuration file and all directives. There is three section, the first one set the general parameters of the application, the second set the parameters related to SMTP or Nagios notification at threshold exceed and the last configure all type of system information you may want to monitor.
Full sample of configuration file:
[GENERAL] DEBUG = 0 DATA_DIR = /usr/local/sysusage/rrdfiles PID_DIR = /usr/local/sysusage/etc DEST_DIR = /var/www/htdocs/sysusage SAR_BIN = /usr/bin/sar UPTIME = /usr/bin/uptime HOSTNAME = /bin/hostname INTERVAL = 60 SKIP = 12:00/14:00 20:00/06:00 HDDTEMP_BIN = /usr/local/sbin/hddtemp SENSORS_BIN = /usr/bin/sensors DAEMON = 0 GRAPH_WIDTH = 550 GRAPH_HEIGHT= 200 FLAMING = 0 HIRES = 0 LINE_SIZE = 2 PROC_QSIZE = 4 RESRC_URL = SSH_BIN = /usr/bin/ssh SSH_OPTION = -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey SSH_USER = SSH_IDENTITY= [ALARM] WARN_MODE = 0 ALARM_PROG = /usr/local/sysusage/bin/sysusagewarn SMTP = localhost FROM = root@localhost TO = root@localhost NAGIOS = /usr/local/nagios/bin/submit_check_result UPPER_LEVEL = 1 LOWER_LEVEL = 2 URL = [MONITOR] load:threshold_max_value blocked:threshold_max_value cpu:threshold_max_value cswch:threshold_max_value intr:threshold_max_value mem:threshold_max_value dirty:threshold_max_value swap:threshold_max_value work:threshold_max_value share:threshold_max_value sock:threshold_max_value socktw:threshold_max_value io:threshold_max_value file:threshold_max_value page:threshold_max_value pcrea:threshold_max_value pswap:threshold_max_value net:threshold_max_value tcp:threshold_max_value err:threshold_max_value disk:threshold_max_value proc:proc_name:threshold_max_value:threshold_min_value tproc:proc_name:threshold_max_value:threshold_min_value queue:path_queue_dir:threshold_max_value hddtemp:device:threshold_max_value dev:device(alias):threshold_max_value dev:device(alias):rpm_speed:raid_type:nb_disk work:threshold_max_value sensors:pattern:threshold_max_value temp:device:threshold_max_value fan:device:threshold_max_value huge:threshold_max_value [PLUGIN testplug] title:Sysage Test plugin menu:Database enable:no program:/usr/local/sysusage/plugins/plugin-sample.pl minThreshold:0 maxThreshold:10 verticalLabel:Number of seconds label1:Total seconds label2: label3: legend1:seconds legend2: legend3: remote:yes [REMOTE hostname1] enable:no ssh_user:monitor ssh_identity:/home/monitor/.ssh/id_rsa #ssh_options: -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey #ssh_command: remote_sysusage:/usr/local/sysusage/bin/rsysusage #[GROUP Web Servers] #hostname1 #hostname2
Section GENERAL
- DEBUG = 0|1
This option is used to set debug mode. If set to 1 then sysusage and sysusagegraph just show what they do but don't create or send anything.
- DATA_DIR = /path/to/rrdfiles
This option is used to set te ouput directory for all RRDTOOL database.
- PID_DIR = /path/to/piddir
sysusage and sysusagegraph use a file to store the pid of the running process to prevent simultaneous run.
- DEST_DIR = /path/to/html_output
Set the path to the directory where all HTML and graph files should be created.
- SAR_BIN = /path/to/sar_binary
sysusage use sar, part of the sysstat distribution to grab system information so we need to know where it is.
- UPTIME = /path/to/uptime_binary
sysusagegraph report the current uptime of the system using the uptime command. Used to set path to uptime binary.
- HOSTNAME = /path/to/hostname_binary
All scripts of Sysusage distribution need to know the name of the host. They use hostname command for that.
- INTERVAL = pull_interval_in_second
All RRDTOOL input use the given interval in second to store monitored values. Graph construction also use this interval to render things properly. By default Sysusage use an interval of 60 seconds to have a better statistic report. You can change this but it's not recommanded. If you change this adjust your crontab to the same value. This value must between 10 and 300 seconds. If you want to be under the minute you must use the daemon mode to run sysusage. See DAEMON bellow.
- SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
You can define here some time range where monitoring will not be done. Value is a list of begin_time/end_time separated by space or tabulation. Let's say you don't want to monitor the host during the night for some good reason, you can write it like that: 20:00/06:00
- HDDTEMP_BIN = /path/to/hddtemp_binary
You can monitor your hard drive temperature if you have installed hddtemp utility. We need to know the path to hddtemp binary.
- SENSORS_BIN = /path/to/sensors_binary
You can monitor your device temperature if you have installed lm_sensor utility. We need to know the path to sensors binary.
- DAEMON = 0 | 1
You can monitor your system under the crond limitation of 1 minute by running sysusage in daemon mode with an INTERVAL between 10 end 60 seconds.
- GRAPH_WIDTH and GRAPH_HEIGHT
These are usefull if you want to resize graph dimension. Default is a width of 550 pixels and a height of 200.
- FLAMING
This is for fun, if you want to have random flaming effect on graphs with only dataset set this directive to 1. Disable by default. Not used with JQuery graph renderer.
- HIRES
Allow addition of hourly graph to have fine granularity of the data. This is disable by default. Set it to any integer between 1 to 23 hours included to show data from past N hours to now. Not used with JQuery graph renderer as the Javascript library allow you to zoom into the resolution you want.
- LINE_SIZE
By default the graph line size is 1 if you want graph with a more thick line set it to 2. This is rrd graph limitation (1 or 2). Not used with JQuery graph renderer.
- PROC_QSIZE
Number of simultaneous remote sysusage call process that should be run. Default is 4 but it can be up to 15 or more depending of the hardware configuration. One per core is the lower value you may think about.
- RESRC_URL
Images, javascripts and css ressources by default are search into the DEST_DIR directory so that in the HTML view they all stayed on the current main directory. You may want to place thoses resources on an other directory or an another place. Using this directive you can set any FQDN, absolute or relative URL for these resources.
- SSH_IDENTITY
Used to set the default identity file to connect to all remote hosts without password. If undefined, sysusage will use the ssh system default value. You may want to use the default value unless you know exactly what's you are doing.
- SSH_OPTION
Use set the default ssh options, that correspond to a passwordless authent:
-o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
with a five seconds connection timeout. You may want to increase this timeout on very slow network links.
Do not change this value unless you know exactly what's you are doing.
- SSH_BIN
Path to the ssh command is set here at install time.
- SSH_USER
Used to defined the default ssh user that will be used to connect to all remote hosts.
Section ALARM
- WARN_MODE = 0|1
Used to disable/enable alert message during threshold exceed.
- ALARM_PROG = /path/to/sysusagewarn
Used to set path to the external program responsible of sending alarm message. You can change it to your own, just take a look at the sysusagewarn usage to see what command line options are used by sysusage
- SMTP = smtp.server.net
Name or Ip address of the SMTP server to contact. Default is none => No smtp message is sent.
- FROM = sender@localhost
Sender email addresse to use in the SMTP message.
- TO = destination@localhost
Destination email address where the alarm message will be sent.
- NAGIOS = /usr/local/nagios/bin/submit_check_result
Path to the external nsca program used to send check message to Nagios. Setting this will activate nagios check report. See at end of this file to see how to configure Nagios
- UPPER_LEVEL = 1
Nagios check level to send when a high threshold limit is reached. Default is 1 => WARNING.
- LOWER_LEVEL = 2
Nagios check level to send when a low threshold limit is reached. Default is 2 => CRITICAL.
- URL = Url of Sysusage report
Used to overwrite the default URL of SysUsage report http://host.dom/sysusage/ especially if you have a special port or a different path. Example: http://hostname.domain:9080/Reports/Sysusage/
- SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
You can define here some time range where alarm notice will not be sent. Value is a list of begin_time/end_time separated by space or tabulation. Let's say you don't want to received notice during the night for some good reason, you can write it like that: 20:00/06:00
Section MONITOR
This section has two different format the first one is used to specify most of the monitoring target:
type:threshold_max
or
type:threshold_max(attempt)
- type
Type of system information you may want to monitor. It can takes around 30 differents values:
load => monitor load average blocked=> monitor task blocked waiting for I/O cpu => monitor each cpu(s) user/nice/system usage => monitor each cpu(s) total/iowait usage => monitor each cpu(s) steal/guest usage cpuall => monitor global cpu(s) statistics cswch => monitor context switches usage intr => monitor number of interrupt per second mem => monitor memory usage dirty => monitor memory active/inactive/dirty memory share => monitore Posix share memory usage (/dev/shm) swap => monitor swap usage work => monitor amount of memory needed for current workload sock => monitor number of open socket socktw => monitor number of socket in TIME_WAIT state io => monitor I/O request and block usage page => monitor I/O page usage pswap => monitor I/O page swap usage pcrea => monitor number of process created per second proc => monitor number of running process tproc => monitor number of running thread file => monitor number of open file queue => monitor number of files in queue net => monitor I/O network bytes on all network interfaces err => monitor bad packet, drop and collision on interfaces tcp => monitor number of tcp connection and segment disk => monitor disk space usage dev => monitor percentage of CPU time per device => monitor average request queue length => monitor I/O sectors read and write to device => monitor time spent in queue (await) => monitor time spent in servicing (svctm) sensors=> monitor fan and device temperature using sensors command hddtemp=> monitor disk drive temperature temp => monitor device temperature using sar fan => monitor fan rotation using sar huge => monitor size of hugepages utilisation
Note: the 'cpu' target monitoring type will report all statictics per cpu. This can represent a lot of informations if you several cpu. To limit statistics to total cpu only, you must replace default the 'cpu' target to 'cpuall' in your configuration file.
- threshold_max
This is the maximum threshold value. Any value equal or upper than this one will generate SMTP and/or Nagios alert if you have enable it.
- attempt
You can delay the call to the alarm program at threshold exceed by specifying the number of consecutive exceed attempt before the command will be called. Just specify the number of attempt between bracket just after the min and/or max threshold value. This setting is optional for both threshold value and the default is to send alarm immediatly.
- Specials cases
There's a special case for 'disk' usage monitoring that allow exclusion of some mount point. This is usefull if you have hard link or some special device you don't need to monitor. Where exclusion is a semi- colon (;) separated list of mount point to exclude from monitoring.
disk:ThresholdMax:exclusion
Ex: disk:90:/home/mondo_image;/home/smb_mountpoint
You can use regexp in your excluded path.
The other directive with special syntax is 'dev'. It is construct as follow:
dev:device(alias):rpm_speed:raid_type:nb_disk
where device is sda, sdb or any device name (without the /dev/), the alias between parenthesis is the name that must be displayed in the user interface instead of the device name. For example:
dev:sdc(ASM disk1): dev:sdb(/data):
I you plan to use I/O workload report, SysUsage need to know the speed of the disk (RPM), the raid type (0,1,5,10) and the number of disk in the raid array to calculate the IOPS. For example if we have a 7200 RPM disk with 2 disk in raid 1, we will write thing like that:
dev:sdc(ASM disk1):7200:1:2
I/O workload is the relation between TPS (transfers per second) and IOPS (I/O operations measured in seconds) of a device. If the tps returned by sysstat reach the maximum theoretical IOPS, your storage subsystem is saturated. Here is the equation to calculate the maximum theoretical IOPS:
d = number of disks dIOPS = IOPS per disk %r = % of read workload %w = % of write workload F = raid factor IOPS = (d *dIOPS) / (%r + (F * %w))
the theoretical maximum IOPS for a RAID set (excluding caching of course). To do this you take the product of the number of disks and IOPS per disk divided by the sum of the
%read
workload and the product of the raid factor and%write
workload. Where%read
and%write
are calculated from the following equation:%r = rd_sec / (rd_sec + wr_sec); %w = wr_sec / (rd_sec + wr_sec);
This IOPS monitoring is build following the excellent article of Nick Anderson readable from Analyzing I/O performance in Linux.
The second format is used to monitor running process, hard drive temperature or queue directory. It has the following format:
type:target:threshold_max_value:threshold_min_value
or
type:target:threshold_max_value(attempt):threshold_min_value(attempt)
- type
Type of system information you may want to monitor. It can takes these differents values:
load, cpu, cswch, intr, mem, swap, work, share, sock, socktw, io, file, page, pcrea, pswap, net, tcp, err, disk, proc, tproc, queue, hddtemp, dev, work, sensors, temp, fan, huge, blocked, dirty
- target
If type is 'proc' or 'tproc' target represent the name of the process to monitor. You can put a regexp as target to match exactly the required process. The number of running process are obtain by the system command line:
ps -e -o command | grep -E "target" | grep -v grep | wc -l
so you can replace the word target by the regexp to match and see if it returns the right number of process.
The number of running thread are obtain by the system command line:
ps -eL -o command | grep -E "target" | grep -v grep | wc -l
If type is 'queue' this represent the full path of the directory to monitor. Sysusage will try to find and count any regular file in the target directory and will not follow sub directories.
If type is 'hddtemp' the target represent the hard drive device to monitor, ex: /dev/sda. You can try it with the following command line:
hddtemp -n /dev/sda
This may return the actual temperature detected on the hard drive.
If this is 'dev' this represent the device name to monitor. Ex: sda. Do not add the /dev/ before this will not work. You may want to change the device name in the graphic menu, this is possible by adding the device alias enclosed with parenthesis.
For example lets say you're monitoring some EMCpower SAN device. Using sar the reported devices are dev120-48 and dev120-64. Once you have find what partition are mapped to these devices (reading /proc/partitions). In this example these devices are mounted as /cache1 and /cache2 so we want to see these mount points instead of device number in the graphical menu:
dev:dev120-48(/cache1):90 dev:dev120-64(/cache2):97
in you sysusage.conf file will do the job. The threshold_max value is the max percentage of CPU used for this device before sending an alarm.
If type is 'sensors' this represent the pattern to match to obtain temperature or fan speed information in the sensors program output. See chapter Sensors to have more information.
If type is 'temp' or 'fan' this represent the device number reported by sar to obtain temperature or fan speed information. To know what device number must be used, see result of command: sar -m ALL 1 1
- threshold_max
This is the maximum threshold value. Any value equal or upper will generate an SMTP and/or Nagios alert if you have enable it.
- threshold_min
This is the minimum threshold value. Any value equal or lower of this one will generate SMTP and/or Nagios alert if you have enable it. Min threshold should certainly only be used with 'proc' and 'tproc' monitoring type. If you set it to 0 then you will be warn if any of the monitored process are down.
- attempt
You can delay the call to the alarm program at threshold exceed by specifying the number of consecutive exceed attempt before the command will be called. Just specify the number of attempt between bracket just after the min and/or max threshold value. This setting is optional for both threshold value and the default is to send alarm immediatly.
For example a load average monitoring defined like this
load:12(3)
will send an alarm when the system load average will exceed 12 after three consecutives attempts at the define interval. If the interval is 60 seconds, the alarm will be sent up to 180 second after the first exceed.
Section PLUGIN
This part enable the use of custom plugins. You can call any program or script provide that it return up to 3 numbers separated by a space character. See plugins/ directory for sample scripts.
This section must include a name composed of any alphanumeric character that will be used to create the target file, for example:
[PLUGIN testplug1] or [PLUGIN testplug2]
The section allow the following configuration directives. They are composed of named directives followed by ':' or '=' and a value.
- enable
Is used to disable temporary the plugin monitoring. Default is 'yes' enable. To disable write it enable:no
- program
Is used to set the path to the program or script to execute as plugin. This program must print to STDOUT 1 to 3 numbers separated by a space character as result following the number of reports you want. So each plugin can have 1, 2 or 3 graphed data.
- title
Is used to set the title of the report page and the index link. Default is set to “Sysusage plugin”.
- menu
Is used to store the plugin under a submenu of the plugins menu. Default is to store plugin under the “Others” submenu.
- maxthreshold
This is the maximum threshold value. Any value equal or upper than this one will generate SMTP and/or Nagios alert if you have enable it.
- minthreshold
This is the minimum threshold value. Any value equal or lower of this one will generate SMTP and/or Nagios alert if you have enable it.
- verticallabel
This is used to set the vertical label of the graph.
- label1, label2, label3
Are used to show a legend for each graphed data, label1 is for the first returned value, label2 for the second and label3 for the last. If you just have one value returned just omit the other labels.
- legend1, legend2, legend3
These are use to set the units for Current, Avg and Max values.
- remote
This directive must be set to 'no' to prevent execution of the plugin program by a issh call to sysusage in a remote context. This directive is activated by default ('yes').
Section REMOTE
This part allow to run sysusage on remote hosts from a central server. It use ssh to execute sysusage on the destination host with the -r option that force sysusage to not write anything to local data files but to print all result to stdout. As sysusage is run by cron job or daemon mode it can not authenticate interactively to remote host so you must give a ssh user and an identity file with the corresponding configuration option.
This section must include the name or the ip address of the remote host that will be used to create the target data directory, for example:
[REMOTE hostname] or [REMOTE host.domain.dom] or [REMOTE 192.168.1.14]
The section allow the following configuration directives. They are composed of named directives followed by ':' or '=' and a value.
Once you have installed sysusage on all remote host and exchange the SSH key certificat between the central host and all remote hosts, most of the time you just have to set the ssh_user directive to have it working. Use remote_sysusage directive if sysusage perl script is not installed on the same place than the central server.
Section GROUP
This section allow you to groups remote host report under a common groupname in the index page. Remote hosts will be ordered following their parent groups. The name of the group can be any string and the values in the section must be a list of remote servers defined in the REMOTE sections.
For example if you are monitoring a cluster of web and database servers you can use the following declaration:
[GROUP Web Servers] webhost1 webhost2 webhost3 [GROUP Database Servers] dbhost1 dbhost2
Of course webhostN and dbhostN hosts must be declared in the remote section.
- enable
Is used to enable/disable the remote host monitoring. Default is 'yes' enable. Set it as 'enable=no' to disable it.
- ssh_user
Used to defined the ssh user allowed to connect to remote host. By default the value set to SSH_USER configuration option in the GENERAL section will be used.
- ssh_identity
Used to set the identity file to connect to remote host without password. By default the value set to SSH_IDENTITY configuration option in the GENERAL section will be used. Usually this is the private key that you've generated using ssh-keygen and most of the time file
$HOME
/.ssh/id_rsa. You may want to use the default value unless you know exactly what's you are doing.- ssh_options
Use to overwrite the default ssh options, that are:
-o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
The default options are set into the SSH_OPTIONS configuration option in the GENERAL section. You may want to use the default value unless you know exactly what's you are doing.
- ssh_command
You can overwrite the complete ssh command using this directive, this will replace the ssh command, the ssh option, the ssh user and the host part. The sysusage remote command will not be replaced. You may want to use the default value unless you know exactly what's you are doing.
- remote_sysusage
Use it to set the path to the rsysusage command that must be used on the remote host, SysUsage will automatically add the -r option to cause the remote execution mode.
Threshold Notification
SMTP alert
Sysusage use an external perl script to send SMTP alert and/or Nagios checks when a max or min threshold is reached. This program is named sysusagewarn. All options of the configuration file in section [ALARM] are use by sysusage to call this program. If they are correctly set you don't have to take care of the parameters given to this program. If you want to use this program outside sysusage, here are the command line options it understand:
Usage: sysusagewarn -t subject -c current_value -v threshold_value [-s smtp_srv] [-f from] [-d to] [-b hostname_prog] -t subject : Subject of the alarm -c value : Current value monitored by sysusage -v value : Threshold value used. -s host : SMTP server name or ip where to send email. -f from : Sender email address of the alarm message. -d to : Destination address of the alarm message. -b path : Path to program hostname. Default is /bin/hostname -n path : Path to Nagios program submit_check_result. Default none. -l value : Alarm level (0=OK,1=WARNING,2=CRITICAL). Default: 1. -r service : Nagios service name to used. Must be any sysusage type of monitoring defined in the configuration file. -u url : Url to HTML sysusage output to include in email. Default: http://hostname.domain/sysusage/ -h : Output this message and exit
NAGIOS alert
SysUsage send check message to Nagios through an external command (submit_check_result). So you need to create the host and associate all sysusage service that you want to monitor with Nagios. The services name correspond to the type of monitoring. For example, if you have enable alarm on memory usage the service sent is 'mem'. There's also specials case with type of monitoring with multiple instance like network monitoring. You need to create a service per instance. For example type 'net' will have 'net_eth0' and 'net_lo' and more if you have more network interface. To see if your sysusage alarm messages are well understood by Nagios take a look at the nagios.log file (default to /usr/local/nagios/var/nagios.log).
To desactivate automatically an alarm reported to Nagios, SysUsage will send each time it run an OK request if every thing is correct for the monitored type.
Sensors
Monitoring of sensors output is based on regexp. To be clear enought here an example:
Sensors output on my server:
adt7463-i2c-0-2d Adapter: SMBus I801 adapter at 1480 V1.5: +3.23 V (min = +0.00 V, max = +3.32 V) VCore: +1.24 V (min = +1.10 V, max = +1.49 V) V3.3: +3.33 V (min = +2.80 V, max = +3.78 V) V5: +4.99 V (min = +4.25 V, max = +5.75 V) V12: +0.11 V (min = +0.00 V, max = +15.94 V) CPU_Fan: 0 RPM (min = 0 RPM) fan2: 10671 RPM (min = 8095 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) CPU Temp: +69.5 C (low = +2.0 C, high = +91.0 C) Board Temp: +32.5 C (low = +2.0 C, high = +83.0 C) Remote Temp: +31.2 C (low = +2.0 C, high = +58.0 C) cpu0_vid: +1.338 V adt7463-i2c-0-2e Adapter: SMBus I801 adapter at 1480 V1.5: +3.21 V (min = +0.00 V, max = +3.32 V) VCore: +1.28 V (min = +1.10 V, max = +1.49 V) V3.3: +3.32 V (min = +2.80 V, max = +3.78 V) V5: +4.95 V (min = +0.00 V, max = +6.64 V) V12: +0.11 V (min = +0.00 V, max = +15.94 V) CPU_Fan: 10843 RPM (min = 8095 RPM) fan2: 0 RPM (min = 0 RPM) fan3: 9642 RPM (min = 8095 RPM) fan4: 0 RPM (min = 0 RPM) CPU Temp: +57.2 C (low = +2.0 C, high = +91.0 C) Board Temp: +35.2 C (low = +2.0 C, high = +91.0 C) Remote Temp: +35.8 C (low = +2.0 C, high = +58.0 C) cpu0_vid: +1.338 V
Following the sensors kernel module load you could have more or less output than that. To monitor all sensors CPUs temperature on my server I need to add the following lines into sysusage.cfg:
sensors:CPU Temp:75 sensors:Board Temp:45 sensors:Remote Temp:45
This will create 3 graphs based on lines matching 'CPU Temp', an other with lines matching 'Board Temp' and the last with lines matching 'Remote Temp'. As I have 2 CPUs for each graph there will be 2 values. You can not report more than 3 values per graph, this is hard coded into sysusage. So if you have more CPUs you will not see more than 3 values. Here it will sent alarm when temperature exceed the given values (75,45,45).
To monitor fan speed, I just add lines like this in the configuration file:
sensors:fan2:11000:8095 sensors:fan3:11000:8095
This whil create 2 graphs for fan 2 and fan 3. With an alarm sent when speed exceed 11000 RPM or is lower than 8095 RPM.
On my personal computer (/etc/sysconfig/lm_sensors => modprobe coretemp) sensors output is:
coretemp-isa-0000 Adapter: ISA adapter Core 0: +53.0 C (high = +78.0 C, crit = +100.0 C) coretemp-isa-0001 Adapter: ISA adapter Core 1: +50.0 C (high = +78.0 C, crit = +100.0 C)
To monitor CPU temprature, I just add this line in my sysusage.cfg:
sensors:Core:70
This will generate a graph with 2 graphed data for Core 0 and Core 1.
Now that sysstat sar natively reports deviceis temperature and fan speed you don't need sensors anymore. Type 'temp' can be used instead and type 'fan' for the fan speed. The target of these types is the device number, See sar -m TEMP or sar -m FAN to see which device number to monitor.
Bugs / Feature Request
Please report any bugs, remarqs and feature request using the Github interface at https://github.com/darold/sysusage/ or send a mail to the author.
License
Copyright (C) 2003-2018 Gilles Darold
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Author
Gilles Darold <gilles _|_At_|_ darold _|_DoT_|_ net>
Acknowlegment
I want ot thanks all the people who help to build this tool with a very special thank to Marat Dyatko for the web design contribution.