If it isn't measured and monitored, you aren't managing it.
Historical monitoring captures events that occur over a period of time, in order to determine trends. This is used for baselining.
Service availability monitoring shows events of interest as they occur, and is used to spot trouble. This needs an alerting system to be useful.
Monitor everything deemed important. A resource is important if you'll get into trouble with your PHB (pointy haired boss) if that resource runs out. Some things must be monitored in a regulated industry, others cannot be. Some things should be monitored historically for audit/security purposes.
You need a policy that:
top df free sar ac w hdparm smartctl i2cdump sensors logwatch logcheck swatch logsurfer webalizer An important feature of monitoring and alerting tools is automatic escalation.
Be careful your network bandwidth isn't used up by your monitoring data. Aim for no more than 1% used, especially over slower links.
Use SNMP
to monitor remote servers.
Summarize remote network monitoring with
RMON.
(SNMP Components: GET, SET commands, a Managment Console, and a
MIB for each
type of component to monitor.)
See the
Cisco SNMP tutorial for more information.
(Command line tools: snmp*, arpsnmp,
net-snmp-config.)
Related to log files, process accounting was used to track system use
to allocate system expenses (say by department).
Full accounting can take a lot of CPU time, RAM, and disk space.
See the sar command for details.