The file /etc/cron.daily/00webalizer
is a shell
script run each day automatically to analyze Apache web server
logs.
(To see the results point your web browser to
http://localhost/usage
.)
One problem I've had was a repeated hacker attempt,
apparently by some script kiddie, to trigger
a buffer overflow in the web server by sending
extremely long URLs.
Such URLs do not cause any problems with the Apache web server,
however some tools such as webalizer
have
difficulty when the log files have very long lines in them.
To solve the problem, I use Perl to pre-process the web log file
just before starting webalizer
.
The first script looks for URLs with a single character that
repeats many times.
The second looks for a short sequence of
,
which means a backslash, an x, and any two characters.
In both cases most of the repeated URL is replaced with
\x..
.
...REPEAT...
The modified script that cron
runs appears below.
#!/bin/sh - # update access statistics for the web site # # /etc/cron.daily/00webalizer, modified by WP 8/04 # $Id: 00webalizer,v 1.1 2005/03/07 17:20:40 root Exp $ if [ -s /var/log/httpd/access_log ] then # Trim long URLs with a repeating character: perl -pi -e 's/(.)\1{50,}.*(.{50})$/$1$1...REPEAT...$1$1...$2/' \ /var/log/httpd/access_log # Trim long URLs with repeating sequences of '\x..': perl -pi -e 's/(\\x..(\\x..)?)\1{50,}.*(.{50})$/$1$1...REPEAT...$1$1...$2/' \ /var/log/httpd/access_log fi /usr/bin/webalizer -Q