If it isn't measured and monitored, you aren't managing it.
Historical monitoring captures events that occur over a period of time, in order to determine trends. This is used for baselining.
Service availability monitoring shows events of interest as they occur, and is used to spot trouble. This needs an alerting system to be useful.
Monitor everything deemed important. A resource is important if you'll get into trouble with your PHB (pointy haired boss) if that resource runs out. Some things must be monitored in a regulated industry, others cannot be. Some things should be monitored historically for audit/security purposes.
You need a policy that:
top
df
free
sar
ac
w
hdparm
smartctl
i2cdump
sensors
logwatch
logcheck
swatch
logsurfer
webalizer
An important feature of monitoring and alerting tools is automatic escalation.
Be careful your network bandwidth isn't used up by your monitoring data. Aim for no more than 1% used, especially over slower links.
Use SNMP
to monitor remote servers.
Summarize remote network monitoring with
RMON.
(SNMP Components: GET, SET commands, a Managment Console, and a
MIB for each
type of component to monitor.)
See the
Cisco SNMP tutorial for more information.
(Command line tools: snmp*
, arpsnmp
,
net-snmp-config
.)
Related to log files, process accounting was used to track system use
to allocate system expenses (say by department).
Full accounting can take a lot of CPU time, RAM, and disk space.
See the sar
command for details.