NAME 

linkchecker - check HTML documents for broken links

SYNOPSIS 

linkchecker [options] [file-or-url]...

DESCRIPTION 

LinkChecker features recursive checking, multithreading, output in colored or normal text, HTML, SQL, CSV or a sitemap graph in GML or XML, support for HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Gopher, Telnet and local file links, restriction of link checking with regular expression filters for URLs, proxy support, username/password authorization for HTTP and FTP, robots.txt exclusion protocol support, i18n support, a command line interface and a (Fast)CGI web interface (requires HTTP server)

EXAMPLES 

The most common use checks the given domain recursively, plus any URL pointing outside of the domain: linkchecker Beware that this checks the whole site which can have several hundred thousands URLs. Use the -r option to restrict the recursion depth. Don't connect to mailto: hosts, only check their URL syntax. All other links are checked as usual: linkchecker --ignore-url=^mailto: Checking a local HTML file on Unix: linkchecker ../bla.html Checking a local HTML file on Windows: linkchecker c:\temp\test.html You can skip the http:// url part if the domain starts with www.: linkchecker You can skip the ftp:// url part if the domain starts with ftp.: linkchecker -r0 Generate a sitemap graph and convert it with the graphviz dot utility: linkchecker -odot -v

OPTIONS 

General options 

-h, --help
Help me! Print usage information for this program.
-fFILENAME, --config=FILENAME
Use FILENAME as configuration file. As default LinkChecker first searches /etc/linkchecker/linkcheckerrc and then ~/.linkchecker/linkcheckerrc.
-I, --interactive
Ask for URL if none are given on the commandline.
-tNUMBER, --threads=NUMBER
Generate no more than the given number of threads. Default number of threads is 10. To disable threading specify a non-positive number.
--priority
Run with normal thread scheduling priority. Per default LinkChecker runs with low thread priority to be suitable as a background job.
-V, --version
Print version and exit.
--allow-root
Do not drop privileges when running as root user on Unix systems.

Output options 

-v, --verbose
Log all checked URLs. Default is to log only errors and warnings.
--no-warnings
Don't log warnings. Default is to log warnings.
-WREGEX, --warning-regex=REGEX
Define a regular expression which prints a warning if it matches any content of the checked link. This applies only to valid pages, so we can get their content.

Use this to check for pages that contain some form of error, for example "This page has moved" or "Oracle Application Server error".

--warning-size-bytes=NUMBER
Print a warning if content size info is available and exceeds the given number of bytes.
-q, --quiet
Quiet operation, an alias for -o none. This is only useful with -F.
-oTYPE[/ENCODING], --output=TYPE[/ENCODING]
Specify output type as text, html, sql, csv, gml, dot, xml, none or blacklist. Default type is text. The various output types are documented below. The ENCODING specifies the output encoding, the default is that of your locale. Valid encodings are listed at .
-FTYPE[/ENCODING][/FILENAME], --file-output=TYPE[/ENCODING][/ FILENAME]
Output to a file linkchecker-out.TYPE, $HOME/.linkchecker/blacklist for blacklist output, or FILENAME if specified. The ENCODING specifies the output encoding, the default is that of your locale. Valid encodings are listed at . The FILENAME and ENCODING parts of the none output type will be ignored, else if the file already exists, it will be overwritten. You can specify this option more than once. Valid file output types are text, html, sql, csv, gml, dot, xml, none or blacklist Default is no file output. The various output types are documented below. Note that you can suppress all console output with the option -o none.
--no-status
Do not print check status messages.
-DSTRING, --debug=STRING
Print debugging output for the given logger. Available loggers are cmdline, checking, cache, gui, dns and all. Specifying all is an alias for specifying all available loggers. The option can be given multiple times to debug with more than one logger. Foraccurateresults,threadingwillbedisabledduring debugruns.
--trace
Print tracing information.
--profile
Write profiling data into a file named linkchecker.prof in the current working directory. See also --viewprof.
--viewprof
Print out previously generated profiling data. See also --profile.

Checking options 

-rNUMBER, --recursion-level=NUMBER
Check recursively all links up to given depth. A negative depth will enable infinite recursion. Default depth is infinite.
--no-follow-url=REGEX
Check but do not recurse into URLs matching the given regular expression. This option can be given multiple times.
--ignore-url=REGEX
Only check syntax of URLs matching the given regular expression. This option can be given multiple times.
-C, --cookies
Accept and send HTTP cookies according to RFC 2109. Only cookies which are sent back to the originating server are accepted. Sent and accepted cookies are provided as additional logging information.
-a, --anchors
Check HTTP anchor references. Default is not to check anchors.
--no-anchor-caching
Treat url#anchora and url#anchorb as equal on caching. This is the default browser behaviour, but it's not specified in the URI specification. Use with care since broken anchors are not guaranteed to be detected in this mode.
-uSTRING, --user=STRING
Try the given username for HTTP and FTP authorization. For FTP the default username is anonymous. For HTTP there is no default username. See also -p.
-pSTRING, --password=STRING
Try the given password for HTTP and FTP authorization. For FTP the default password is anonymous@. For HTTP there is no default password. See also -u.
--timeout=NUMBER
Set the timeout for connection attempts in seconds. The default timeout is 60 seconds.
-PNUMBER, --pause=NUMBER
Pause the given number of seconds between two subsequent connection requests to the same host. Default is no pause between requests.
-NSTRING, --nntp-server=STRING
Specify an NNTP server for news: links. Default is the environment variable NNTP_SERVER. If no host is given, only the syntax of the link is checked.
--no-proxy-for=REGEX
Contact hosts that match the given regular expression directly instead of going through a proxy. This option can be given multiple times.

OUTPUT TYPES 

Note that by default only errors and warnings are logged. You should use the --verbose option to get the complete URL list, especially when outputting a sitemap graph format.
text
Standard text logger, logging URLs in keyword: argument fashion.
html
Log URLs in keyword: argument fashion, formatted as HTML. Additionally has links to the referenced pages. Invalid URLs have HTML and CSS syntax check links appended.
csv
Log check result in CSV format with one URL per line.
gml
Log parent-child relations between linked URLs as a GML sitemap graph.
dot
Log parent-child relations between linked URLs as a DOT sitemap graph.
gxml
Log check result as a GraphXML sitemap graph.
xml
Log check result as machine-readable XML.
sql
Log check result as SQL script with INSERT commands. An example script to create the initial SQL table is included as create.sql.
blacklist
Suitable for cron jobs. Logs the check result into a file ~/.linkchecker/blacklist which only contains entries with invalid URLs and the number of times they have failed.
none
Logs nothing. Suitable for debugging or checking the exit code.

REGULAR EXPRESSIONS 

Only Python regular expressions are accepted by LinkChecker. See for an introduction in regular expressions.

The only addition is that a leading exclamation mark negates the regular expression.

COOKIE FILES 

A cookie file contains standard RFC 805 header data with the following possible names:
Scheme (optional)
Sets the scheme the cookies are valid for; default scheme is http.
Host (required)
Sets the domain the cookies are valid for.
Path (optional)
Gives the path the cookies are value for; default path is /.
Set-cookie (optional)
Set cookie name/value. Can be given more than once.

Multiple entries are separated by a blank line. The example below will send two cookies to all URLs starting with and one to all URLs starting with :

Host: imadoofus.org Path: /hello Set-cookie: ID="smee" Set-cookie: spam="egg"

Scheme: https Host: imaweevil.org Set-cookie: baggage="elitist"; comment="hologram"

PROXY SUPPORT 

To use a proxy set $http_proxy, $https_proxy, $ftp_proxy, $gopher_proxy on Unix or Windows to the proxy URL. The URL should be of the form http://[user:pass@]host[ :port], for example http://localhost:8080, or http://joe:test@proxy.domain. On a Mac use the Internet Config to select a proxy.

NOTES 

URLs on the commandline starting with ftp. are treated like ftp://ftp., URLs starting with www. are treated like http://www.. You can also give local files as arguments.

If you have your system configured to automatically establish a connection to the internet (e.g. with diald), it will connect when checking links not pointing to your local host. Use the -s and -i options to prevent this.

Javascript links are currently ignored.

If your platform does not support threading, LinkChecker disables it automatically.

You can supply multiple user/password pairs in a configuration file.

When checking news: links the given NNTP host doesn't need to be the same as the host of the user browsing your pages.

ENVIRONMENT 

NNTP_SERVER - specifies default NNTP server http_proxy - specifies default HTTP proxy server ftp_proxy - specifies default FTP proxy server LC_MESSAGES, LANG, LANGUAGE - specify output language

RETURN VALUE 

The return value is non-zero when
*
invalid links were found or
*
link warnings were found and warnings are enabled
*
a program error occurred.

FILES 

/etc/linkchecker/linkcheckerrc, ~/.linkchecker/linkcheckerrc - default configuration files ~/.linkchecker/blacklist - default blacklist logger output filename linkchecker-out.TYPE - default logger file output name - valid output encodings - regular expression documentation

AUTHOR 

Bastian Kleineidam <calvin@users.sourceforge.net>

COPYRIGHT 

Copyright © 2000-2005 Bastian Kleineidam