NAME
exclude_robot.pl - a simple filter script to filter
robots out of logfiles
SYNOPSIS
exclude_robot.pl
-url <robot exclusions URL>
[ -exclusions_file <exclusions file> ]
<httpd log file>
OR
cat <httpd log file> | exclude_robot.pl -url <robot exclusions URL>
DESCRIPTION
This script filters
HTTP log files to exclude entries that
correspond to know webbots, spiders, and other undesirables. The
script requires a URL as a command line
option which should point to a text file containing a linebreak
separated list of lowercase strings to match on for bots. This is
based on the format used by ABC
(<http://www.abc.org.uk/exclusionss/exclude.html>).
The script filters httpd logfile entries either from a filename
specified on the command line, or from STDIN. It outputs filtered entries to STDOUT.
OPTIONS
- -url <robot exclusions URL>
- Specify the URL of
file to grab which contains the list of agents to exclude. The
option is REQUIRED.
- -exclusions_file <exclusions file>
- Specify a file to save excluded entries
from the logfile. This option is OPTIONAL.
AUTHOR
Ave Wrigley <Ave.Wrigley@itn.co.uk>
COPYRIGHT
Copyright (c) 2001 Ave
Wrigley. All rights reserved. This program is free software; you
can redistribute it and/or modify it under the same terms as Perl
itself.