System Monitoring Tutorial

System (and Network) Monitoring Concepts:

If it isn't measured and monitored, you aren't managing it.

Historical monitoring captures events that occur over a period of time, in order to determine trends.  This is used for baselining.

Service availability monitoring shows events of interest as they occur, and is used to spot trouble.  This needs an alerting system to be useful.

Monitoring Policy:

Monitor everything deemed important.  A resource is important if you'll get into trouble with your PHB (pointy haired boss) if that resource runs out.  Some things must be monitored in a regulated industry, others cannot be.  Some things should be monitored historically for audit/security purposes.

You need a policy that:

Some commonly Monitored Resources on Production Servers:

Tools for Monitoring and Alerting:

An important feature of monitoring and alerting tools is automatic escalation.

Be careful your network bandwidth isn't used up by your monitoring data.  Aim for no more than 1% used, especially over slower links.

Use SNMP to monitor remote servers.  Summarize remote network monitoring with RMON.  (SNMP Components: GET, SET commands, a Managment Console, and a MIB for each type of component to monitor.)  See the Cisco SNMP tutorial for more information.  (Command line tools: snmp*, arpsnmp, net-snmp-config.)

Process Accounting

Related to log files, process accounting was used to track system use to allocate system expenses (say by department).  Full accounting can take a lot of CPU time, RAM, and disk space.  See the sar command for details.