NAME
utrac - recognize and convert charset and end-of-line
of text files
SYNOPSIS
utrac [OPTION] [FILE]
DESCRIPTION
Utrac is a tool (and a library) that recognize
the charset and the end of line type used in a text file. It can
also convert it. In case of 8bits charsets, recognition is not
sure, so it can also assist the user to choose the correct charset,
for instance by filtering the text and displaying only lines that
matter.
OPTIONS
With no FILE, read standard input. With no OPTION, recognize and
write converted text to standard output.
- -p, --print-charset
- Print the name of the charset that suits best the input file.
- -P, --print-all-charset
- Print ranked list of charsets. The first column is the mark
with locale bonus (language and system), the second is the mark
brut, the third is the checksum of all extended character (to know
which charsets produce the same results) and the fourth is the
charset name (on the same line if their mark with bonus and their
checksum are identical).
If the recognition is sure (ASCII or UTF-8), print only one name.
- -f, --from
- Force input charset (disable recognition) and/or EOL. For
instance, "UTF-8/CRLF".
- -t, --to
- Select output charset and/or EOL. See above.
- -L, --language
- Select language. All charsets that fit this language will get a
bonus during recognition. If none specified, LC_* variables
are used.
- -S, --system
- Select system. All charsets that fit this language will get a
bonus during recognition.
- -x, --ext-chars
- Print lines with extended characters (try to print each
extended character not more than once).
- -d, --distribution
- Print distribution, i.e. the count of each 8bits character.
- -a, --all-ext-chars
- Print each extended character of the file in each different
charset (UTF-8 output is recommended).
- -c, --colors
- (with -x or -a) Use color.
- -b, --bar
- Display a progress bar.
- -i, --info
- Print default/chosen parameters.
- -l, --list
- List charsets/eol/languages/systems.
- -h, --help Print some help.
- -v, --version
- Print version.
FILES
- charset.dat
- This file should be located in /usr/local/share/utrac/
or /usr/share/utrac/. It contains informations about
charsets and their related charmap. If you want to add new charsets
(they must be 8bits and ASCII compatible), check the script
merge.pl in Utrac source directory.
BUGS
Utrac is still a beta version, so you can expect to
find some bugs... Please report them to <antoine@alliancemca.net>.
If you have a text file that is not well recognize by Utrac, please
send it to improve the recognition algorithm.
AUTHOR
Written by Antoine Calando <antoine@alliancemca.net>.
COPYRIGHT
Copyright © 2004 Alliance MCA.
This is free software; see the source for copying conditions. There
is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
SEE ALSO
You can find more documentation from http://utrac.sourceforge.net