NAME
bzz - DjVu general purpose compression utility.
SYNOPSIS
Encoding:
bzz -e[blocksize] inputfile
outputfile
Decoding:
bzz -d inputfile outputfile
DESCRIPTION
The first form of the command line (option
-e ) compresses the data from file inputfile and
writes the compressed data into outputfile. The second form
of the command line (option -d ) decompressed file
inputfile and writes the output to outputfile.
OPTIONS
- -d
- Decoding mode.
- -e[blocksize]
- Encoding mode. The optional argument blocksize specifies
the size of the input file blocks processed by the Burrows-Wheeler
transform expressed in kilobytes. The default block sizes is 2048
KB. The maximal block size is 4096
KB. Specifying a larger block size usually
produces higher compression ratios and increases the memory
requirements of both the encoder and decoder. It is useless to
specify a block size that is larger than the input
file.
ALGORITHMS
The Burrows-Wheeler transform is performed using
a combination of the Karp-Miller-Rosenberg and the
Bentley-Sedgewick algorithms. This is comparable to (Sadakane, DCC
98) with a slightly more flexible ranking scheme. Symbols are then
ordered according to a running estimate of their occurrence
frequencies. The symbol ranks are then coded using a simple fixed
tree and the ZP binary adaptive coder (Bottou, DCC 98).
The Burrows-Wheeler transform is also used in the well known
compressor bzip2. The originality of bzz is the use
of the ZP adaptive coder. The adaptation noise may cost up to 5 in
file size, but this penalty is usually offset by the benefits of
adaptation.
PERFORMANCE
The following table shows comparative results
(in bits per character) on the Canterbury Corpus (
). The very good bzz performance on the spreadsheet file
excl puts the weighted average ahead of much more
sophisticated compressors such as fsmx.
| Compression performance
|
|
| text
| fax
| csrc
| excl
| sprc
| tech
| poem
| html
| lisp
| man
| play
| Weighted
| Average
|
|
| compress
| 3.27
| 0.97
| 3.56
| 2.41
| 4.21
| 3.06
| 3.38
| 3.68
| 3.90
| 4.43
| 3.51
| 2.55
| 3.31
|
| gzip -9
| 2.85
| 0.82
| 2.24
| 1.63
| 2.67
| 2.71
| 3.23
| 2.59
| 2.65
| 3.31
| 3.12
| 2.08
| 2.53
|
| bzip2 -9
| 2.27
| 0.78
| 2.18
| 1.01
| 2.70
| 2.02
| 2.42
| 2.48
| 2.79
| 3.33
| 2.53
| 1.54
| 2.23
|
| ppmd
| 2.31
| 0.99
| 2.11
| 1.08
| 2.68
| 2.19
| 2.48
| 2.38
| 2.43
| 3.00
| 2.53
| 1.65
| 2.20
|
| fsmx
| 2.10
| 0.79
| 1.89
| 1.48
| 2.52
| 1.84
| 2.21
| 2.24
| 2.29
| 2.91
| 2.35
| 1.63
| 2.06
|
| bzz
| 2.25
| 0.76
| 2.13
| 0.78
| 2.67
| 2.00
| 2.40
| 2.52
| 2.60
| 3.19
| 2.52
| 1.44
|
2.16 | |
Note that DjVu contributors people have several
entries in this table. Program compress was written some
time ago by Joe Orost. Program ppmd is an improvement of
the PPM-C method invented by Paul
Howard.
CREDITS
Program bzz was written by L'eon Bottou
<leonb@users.sourceforge.net>
and was then improved by Andrei Erofeev <andrew_erofeev@yahoo.com>,
Bill Riemers <docbill@sourceforge.net>
and many others.
SEE ALSO
djvu(1),
compress(1),
gzip(1),
bzip2(1)