NAME
normalize - pretty-print an HTML file
SYNOPSIS
normalize [ -x ] [ -e ] [
-d ] [ -i indent ] [ -l
line-length ] [ file ]
DESCRIPTION
The normalize command pretty-prints an HTML file, and
also tries to fix small errors. The output is the same HTML, but
with a maximum line length and with optional indentation to
indicate the nesting level of each line.
OPTIONS
The following options are supported:
- -x
- Use XML conventions: empty elements are written with a slash at
the end: <IMG />. Implies -e.
- -e
- Always insert endtags, even if HTML does not require them. (For
example, </p> and </li>.)
- -d
- Omit the DOCTYPE from the output.
- -i indent
- Set the number of spaces to indent each nesting level. Default
is 2. Not all elements cause an indent. In general, elements that
can occur in a block environment are started on a new line and
cause an indent, but inline elements, such as EM and SPAN do not
cause an indent.
- -l line-length
- Sets the maximum length of lines. normalize will wrap
lines so that all lines are as long as possible, but no longer than
this length. Default is 72. Words that are longer than the line
length will not be broken, and will extend past this length. A
content of the STYLE, SCRIPT and PRE elements will not be
line-wrapped.
OPERANDS
The following operand is supported:
- file
- The name of an HTML file. If absent, standard input is read
instead.
EXIT STATUS
The following exit values are returned:
- 0
- Successful completion.
- >0
- An error occurred in the parsing of the HTML file.
normalize will try to correct the error and produce output
anyway.
SEE ALSO
xml2asc(1),
UTF-8 (RFC 2279)
BUGS
The error recovery for incorrect HTML is primitive.
normalize will not omit an end tag if the white space
after it could possibly be significant. E.g., it will not remove
the first </p> from "<div><p>text</p>
<p>text</p></div>".