NAME
webalizer - A web server log file analysis tool.
SYNOPSIS
webalizer [ option ... ] [
log-file ]
webazolver [ option ... ] [ log-file ]
DESCRIPTION
The Webalizer is a web server log file
analysis program which produces usage statistics in HTML format for
viewing with a browser. The results are presented in both columnar
and graphical format, which facilitates interpretation. Yearly,
monthly, daily and hourly usage statistics are presented, along
with the ability to display usage by site, URL, referrer, user
agent (browser), username, search strings, entry/exit pages, and
country (some information may not be available if not present in
the log file being processed).
The Webalizer supports CLF (common log format) log
files, as well as Combined log formats as defined by NCSA
and others, and variations of these which it attempts to handle
intelligently. In addition, the Webalizer also supports
wu-ftpd xferlog formatted log files, allowing
analysis of ftp servers, and squid proxy logs. Logs may also
be compressed, via gzip. If a compressed log file is
detected, it will be automatically uncompressed while it is read.
Compressed logs must have the standard gzip extension of
.gz.
webazolver is normally just a symbolic link to the
webalizer. When run as webazolver, only DNS file
creation/updates are performed, and the program will exit once
complete. All normal options and configuration directives are
available, however many will not be used. In addition, a DNS cache
file must be specified. If the number of DNS children processes to
use are not specified, the webazolver will default to
5.
This documentation applies to The Webalizer Version 2.01
RUNNING THE WEBALIZER
The Webalizer was designed to
be run from a Unix command line prompt or as a
job. Once executed, the general flow of the program is:
- o
- A default configuration file is scanned for. A file named
webalizer.conf is searched for in the current directory, and
if found, and is owned by the invoking user, then its configuration
data is parsed. If the file is not present in the current
directory, the file /etc/webalizer.conf is searched for and,
if found, is used instead.
- o
- Any command line arguments given to the program are parsed.
This may include the specification of a configuration file, which
is processed at the time it is encountered.
- o
- If a log file was specified, it is opened and made ready for
processing. If no log file was given, STDIN is used for
input. If the log filename '-' is specified, STDIN
will be forced.
- o
- If an output directory was specified, the program does a
to
that directory in prepration for generating output. If no output
directory was given, the current directory is used.
- o
- If a non-zero number of DNS Children processes were specified,
they will be started, and the specified log file will be processed,
creating or updating the specified DNS cache file.
- o
- If no hostname was given, the program attempts to get the
hostname using a
system call. If that fails, localhost is used.
- o
- A history file is searched for in the current directory (output
directory) and read if found. This file keeps totals for previous
months, which is used in the main index.html HTML document.
Note: The file location can now be specified with the
HistoryName configuration option.
- o
- If incremental processing was specified, a data file is
searched for and loaded if found, containing the 'internal state'
data of the program at the end of a previous run. Note: The
file location can now be specified with the IncrementalName
configuration option.
- o
- Main processing begins on the log file. If the log spans
multiple months, a seperate HTML document is created for each
month.
- o
- After main processing, the main index.html page is
created, which has totals by month and links to each months HTML
document.
- o
- A new history file is saved to disk, which includes totals
generated by The Webalizer during the current run.
- o
- If incremental processing was specified, a data file is written
that contains the 'internal state' data at the end of this
run.
INCREMENTAL PROCESSING
Version 1.2x of The Webalizer
adds incremental run capability. Simply put, this allows processing
large log files by breaking them up into smaller pieces, and
processing these pieces instead. What this means in real terms is
that you can now rotate your log files as often as you want, and
still be able to produce monthly usage statistics without the loss
of any detail. Basically, The Webalizer saves and restores
all internal data in a file named webalizer.current. This
allows the program to 'start where it left off' so to speak, and
allows the preservation of detail from one run to the next. The
data file is placed in the current output directory, and is a plain
ascii text file that can be viewed with any standard text editor.
It's location and name may be changed using the
IncrementalName configuration keyword.
Some special precautions need to be taken when using the
incremental run capability of The Webalizer. Configuration
options should not be changed between runs, as that could cause
corruption of the internal data stored. For example, changing the
MangleAgents level will cause different representations of
user agents to be stored, producing invalid results in the user
agents section of the report. If you need to change configuration
options, do it at the end of the month after normal processing of
the previous month and before processing the current month. You may
also want to delete the webalizer.current file as well.
The Webalizer also attempts to prevent data duplication
by keeping track of the timestamp of the last record processed.
This timestamp is then compared to current records being processed,
and any records that were logged previous to that timestamp are
ignored. This, in theory, should allow you to re-process logs that
have already been processed, or process logs that contain a mix of
processed/not yet processed records, and not produce duplication of
statistics. The only time this may break is if you have duplicate
timestamps in two seperate log files... any records in the second
log file that do have the same timestamp as the last record in the
previous log file processed, will be discarded as if they had
already been processed. There are lots of ways to prevent this
however, for example, stopping the web server before rotating logs
will prevent this situation. This setup also necessitates that you
always process logs in chronological order, otherwise data loss
will occur as a result of the timestamp compare.
REVERSE DNS LOOKUPS
The Webalizer supports reverse DNS
lookups through a DNS cache file that is either
created/updated at run-time, or has been previously created, either
by a previous run of the webalizer, or by running the
stand-alone version, webazolver. In order to perform reverse
DNS lookups, a DNSCache filename must be specified. In order
to create/update the cache file at run-time, the DNSChildren
number must be non-zero. The DNSChildren value specifies the
number of children processes to fork, each of which will perform
reverse DNS lookups in order to create/update the DNS cache file.
See the file DNS.README for additional information.
COMMAND LINE OPTIONS
The Webalizer supports many different
configuration options that will alter the way the program behaves
and generates output. Most of these can be specified on the command
line, while some can only be specified in a configuration file. The
command line options are listed below, with references to the
corresponding configuration file keywords.
General Options
- -h
- Display all available command line options and exit program.
- -v -V
- Display program version and exit program.
- -d
- Debug. Display debugging information for errors and
warnings.
- -i
- IgnoreHist. Ignore history. USE WITH CAUTION.
This will cause The Webalizer to ignore any previous monthly
history file only. Incremental data (if present) is still
processed.
- -p
- Incremental. Preserve internal data between runs.
- -q
- Quiet. Supress informational messages. Does not supress
warnings or errors.
- -Q
- ReallyQuiet. Supress all messages including warnings and
errors.
- -T
- TimeMe. Force display of timing information at end of
processing.
- -c file
- Use configuration file file.
- -n name
- HostName. Use the hostname name.
- -o dir
- OutputDir. Use output directory dir.
- -t name
- ReportTitle. Use name for report title.
- ReportTitle. Use name for report title.
- -F ( clf | ftp | squid )
- LogType. Specify log type to be processed. Value can be
either clf, ftp or squid format. If not
specified, will default to CLF format. FTP logs must
be in standard wu-ftpd xferlog format.
- -f
- FoldSeqErr. Fold out of sequence log records back into
analysis, by treating as if they were the same date/time as the
last good record. Normally, out of sequence log records are simply
ignored.
- -Y
- CountryGraph. Supress country graph.
- -G
- HourlyGraph. Supress hourly graph.
- -x name
- HTMLExtension. Defines HTML file extension to use. If
not specified, defaults to html. Do not include the leading
period.
- -H
- HourlyStats. Supress hourly statistics.
- -L
- GraphLegend. Supress color coded graph legends.
- -l num
- GraphLines. Specify number of background lines. Default
is 2. Use zero ('0') to disable the lines.
- -P name
- PageType. Specify file extensions that are considered
pages. Sometimes referred to as pageviews.
- -m num
- VisitTimeout. Specify the Visit timeout period.
Specified in number of seconds. Default is 1800 seconds (30
minutes).
- -I name
- IndexAlias. Use the filename name as an
additional alias for index..
- -M num
- MangleAgents. Mangle user agent names according to the
mangle level specified by num. Mangle levels are:
-
- 5 Browser name and major version.
-
- 4 Browser name, major and minor version.
-
- 3 Browser name, major version, minor version to two
decimal places.
-
- 2 Browser name, major and minor versions and
sub-version.
-
- 1 Browser name, version and machine type if possible.
-
- 0 All informaiton (left unchanged).
- -g num
- GroupDomains. Automatically group sites by domain. The
grouping level specified by num can be thought of as 'the
number of dots' to display in the grouping. The default value of
0 disables any domain grouping.
- -D name
- DNSCache. Use the DNS cache file name.
- -N num
- DNSChildren. Use num DNS children processes to
perform DNS lookups, either creating or updateing the DNS cache
file. Specify zero (0) to disable cache file
creation/updates. If given, a DNS cache filename must be
specified.
Hide Options
- -a name
- HideAgent. Hide user agents matching name.
- -r name
- HideReferrer. Hide referrer matching name.
- -s name
- HideSite. Hide site matching name.
- -X name
- HideAllSites. Hide all individual sites (only display
groups).
- -u name
- HideURL. Hide URL matching name.
Table size options
- -A num
- TopAgents. Display the top num user agents table.
- -R num
- TopReferrers. Display the top num referrers
table.
- -S num
- TopSites. Display the top num sites table.
- -U num
- TopURLs. Display the top num URL's table.
- -C num
- TopCountries. Display the top num countries
table.
- -e num
- TopEntry. Display the top num entry pages table.
- -E num
- TopExit. Display the top num exit pages
table.
CONFIGURATION FILES
Configuration files are standard
text files that may be created or edited using any standard editor.
Blank lines and lines that begin with a pound sign ('#') are
ignored. Any other lines are considered to be configurgation lines,
and have the form "Keyword Value", where the 'Keyword' is one of
the currently available configuration keywords defined below, and
'Value' is the value to assign to that particular option. Any text
found after the keyword up to the end of the line is considered the
keyword's value, so you should not include anything after the
actual value on the line that is not actually part of the value
being assigned. The file sample.conf provided with the
distribution contains lots of useful documentation and examples as
well.
General Configuration Keywords
- LogFile name
- Use log file named name. If none specified, STDIN
will be used.
- LogType name
- Specify log file type as name. Values can be either
web, squid or ftp, with the default being
web.
- OutputDir dir
- Create output in the directory dir. If none specified,
the current directory will be used.
- HistoryName name
- Filename to use for history file. Relative to output directory
unless absolute name is given (ie: starts with '/'). Defaults to
'webalizer.hist' in the standard output directory.
- ReportTitle name
- Use the title string name for the report title. If none
- Use the title string name for the report title. If none
specified, use the default of (in english) "Usage Statistics
for ".
- Hostname name
- Set the hostname for the report as name. If none
specified, an attempt will be made to gather the hostname via a
system call. If that fails, localhost will be used.
- UseHTTPS ( yes | no )
- Use https:// on links to URLS, instead of the default
http://, in the 'Top URL's' table.
- Quiet ( yes | no )
- Supress informational messages. Warning and Error messages will
not be supressed.
- ReallyQuiet ( yes | no )
- Supress all messages, including Warning and Error messages.
- Debug ( yes | no )
- Print extra debugging information on Warnings and Errors.
- TimeMe ( yes | no )
- Force timing information at end of processing.
- GMTTime ( yes | no )
- Use GMT (UTC) time instead of local timezone for
reports.
- IgnoreHist ( yes | no )
- Ignore previous monthly history file. USE WITH CAUTION.
Does not prevent Incremental file processing.
- FoldSeqErr ( yes | no )
- Fold out of sequence log records back into analysis by treating
them as if they had the same date/time as the last good record.
Normally, out of sequence log records are ignored.
- CountryGraph ( yes | no )
- Display Country Usage Graph in output report.
- DailyGraph ( yes | no )
- Display Daily Graph in output report.
- DailyStats ( yes | no )
- Display Daily Statistics in output report.
- HourlyGraph ( yes | no )
- Display Hourly Graph in output report.
- HourlyStats ( yes | no )
- Display Hourly Statistics in output report.
- PageType name
- Define the file extensions to consider as a page. If a
file is found to have the same extension as name, it will be
counted as a page (sometimes called a pageview).
- GraphLegend ( yes | no )
- Allows the color coded graph legends to be enabled/disabled.
- GraphLines num
- Specify the number of background reference lines displayed on
the graphs produced. Disable by using zero ('0'), default is
2.
- VisitTimeout num
- Specifies the visit timeout value. Default is 1800
seconds (30 minutes). A visit is determined by looking at the
difference in time between the current and last request from a
specific site. If the difference is greater or equal to the timeout
value, the request is counted as a new visit. Specified in seconds.
- IndexAlias name
- Use name as an additional alias for index.*.
- MangleAgents num
- Mangle user agent names based on mangle level num. See
the -M command line switch for mangle levels and their
meaning. The default is 0, which doesn't mangle user agents
at all.
- SearchEngine name variable
- Allows the specification of search engines and their query
strings. The name is the name to match against the referrer
string for a given search engine. The variable is the cgi
variable that the search engine uses for queries. See the
sample.conf file for example usage with common search
engines.
- Incremental ( yes | no )
- Enable Incremental mode processing.
- IncrementalName name
- Filename to use for incremental data. Relative to output
directory unless an absolute name is given (ie: starts with '/').
Defaults to 'webalizer.current' in the standard output
directory.
- DNSCache name
- Filename to use for the DNS cache. Relative to output directory
unless an absolute name is given (ie: starts with '/').
- DNSChildren num
- Number of children DNS processes to run in order to
create/update the DNS cache file. Specify zero (0) to
disable.
Top Table Keywords
- TopAgents num
- Display the top num User Agents table. Use zero to
disable.
- AllAgents ( yes | no )
- Create seperate HTML page with All User Agents.
- TopReferrers num
- Display the top num Referrers table. Use zero to
disable.
- AllReferrers ( yes | no )
- Create seperate HTML page with All Referrers.
- TopSites num
- Display the top num Sites table. Use zero to disable.
- TopKSites num
- Display the top num Sites (by KByte) table. Use zero to
disable.
- AllSites ( yes | no )
- Create seperate HTML page with All Sites.
- TopURLs num
- Display the top num URLs table. Use zero to disable.
- TopKURLs num
- Display the top num URLs (by KByte) table. Use zero to
disable.
- AllURLs ( yes | no )
- Create seperate HTML page with All URLs.
- TopCountries num
- Display the top num Countries in the table. Use zero to
disable.
- TopEntry num
- Display the top num Entry Pages in the table. Use zero
to disable.
- TopExit num
- Display the top num Exit Pages in the table. Use zero to
disable.
- TopSearch num
- Display the top num Search Strings in the table. Use
zero to disable.
- AllSearchStr ( yes | no )
- Create seperate HTML page with All Search Strings.
- TopUsers num
- Display the top num Usernames in the table. Use zero to
disable. Usernames are only available if using http based
authentication.
- AllUsers ( yes | no )
- Create seperate HTML page with All Usernames.
Hide/Ignore/Group/Include Keywords
- HideAgent name
- Hide User Agents that match name.
- HideReferrer name
- Hide Referrers that match name.
- HideSite name
- Hide Sites that match name.
- HideAllSites ( yes | no )
- Hide all individual sites. This causes only grouped sites to be
displayed.
- HideURL name
- Hide URL's that match name.
- HideUser name
- Hide Usernames that match name.
- IgnoreAgent name
- Ignore User Agents that match name.
- IgnoreReferrer name
- Ignore Referrers that match name.
- IgnoreSite name
- Ignore Sites that match name.
- IgnoreURL name
- Ignore URL's that match name.
- IgnoreUser name
- Ignore Usernames that match name.
- GroupAgent name [Label]
- Group User Agents that match name. Display Label
in 'Top Agent' table if given (instead of name).
- GroupReferrer name [Label]
- Group Referrers that match name. Display Label in
'Top Referrer' table if given (instead of name).
- GroupSite name [Label]
- Group Sites that match name. Display Label in
'Top Site' table if given (instead of name).
- GroupDomains num
- Automatically group sites by domain. The value num
specifies the level of grouping, and can be thought of as the
'number of dots' to be displayed. The default value of 0
disables domain grouping.
- GroupURL name [Label]
- Group URL's that match name. Display Label in
'Top URL' table if given (instead of name).
- GroupUser name [Label]
- Group Usernames that match name. Display Label in
'Top Usernames' table if given (instead of name).
- IncludeSite name
- Force inclusion of sites that match name. Takes
precedence over Ignore# keywords.
- IncludeURL name
- Force inclusion of URL's that match name. Takes
precedence over Ignore# keywords.
- IncludeReferrer name
- Force inclusion of Referrers that match name. Takes
precedence over Ignore# keywords.
- IncludeAgent name
- Force inclusion of User Agents that match name. Takes
precedence over Ignore* keywords.
- IncludeUser name
- Force inclusion of Usernames that match name. Takes
precedence over Ignore* keywords.
HTML Generation Keywords
- HTMLExtension text
- Defines the HTML file extension to use. Default is html.
Do not include the leading period!
- HTMLPre text
- Insert text at the very beginning of the generated HTML
file. Defaults to a standard html 3.2 DOCTYPE record.
- HTMLHead text
- Insert text within the <HEAD></HEAD> block
of the HTML file.
- HTMLBody text
- Insert text in HTML page, starting with the <BODY>
tag. If used, the first line must be a <BODY ...> tag.
Multiple lines may be specified.
- HTMLPost text
- Insert text at top (before horiz. rule) of HTML pages.
Multiple lines may be specified.
- HTMLTail text
- Insert text at bottom of the HTML page. The text
is top and right aligned within a table column at the end of the
report.
- HTMLEnd text
- Insert text at the very end of the HTML page. If not
specified, the default is to insert the ending </BODY> and
</HTML> tags. If used, you must supply these tags
yourself.
Dump Object Keywords
The Webalizer allows you to export processed data to other
programs by using tab delimited text files. The Dump*
commands specify which files are to be written, and where.
- DumpPath name
- Save dump files in directory name. If not specified, the
default output directory will be used. Do not specify a trailing
slash (/fP).
- DumpExtension name
- Use name as the filename extension for dump files. If
not given, the default of tab will be used.
- DumpHeader ( yes | no )
- Print a column header as the first record of the file.
- DumpSites ( yes | no )
- Dump the sites data to a tab delimited file.
- DumpURLs ( yes | no )
- Dump the url data to a tab delimited file.
- DumpReferrers ( yes | no )
- Dump the referrer data to a tab delimitd file. This data is
only available if using a log that contains referrer information
(ie: a combined format web log).
- DumpAgents ( yes | no )
- Dump the user agent data to a tab delimited file. This data is
only available if using a log that contains user agent information
(ie: a combined format web log).
- DumpUsers ( yes | no )
- Dump the username data to a tab delimited file. This data is
only available if processing a wu-ftpd xferlog or a web log that
contains http authentication information.
- DumpSearchStr ( yes | no )
- Dump the search string data to a tab delimited file. This data
is only available if processing a web log that contains referrer
information and had search string information present.
FILES
- webalizer.conf
- Default configuration file. Is searched for in the current
directory and if not found, in the /etc/ directory.
- webalizer.hist
- Monthly history file for previous 12 months. (can be changed)
- webalizer.current
- Current state data file (Incremental processing). (can be
changed)
- xxxxx_YYYYMM.html
- Various monthly HTML output files produced. (extension
can be changed)
- xxxxx_YYYYMM.png
- Various monthly image files used in the reports.
- xxxxx_YYYYMM.tab
- Monthly tab delimited text files. (extension can be
changed)
BUGS
Report bugs to .
COPYRIGHT
Copyright (C) 1997-2000 by Bradford L. Barrett.
Distributed under the GNU GPL. See the files "COPYING" and
"Copyright", supplied with all distributions for additional
information.
AUTHOR
Bradford L. Barrett <>