bzz - Man Page
DjVu general purpose compression utility.
Synopsis
Encoding
bzz -e[blocksize] inputfile outputfile
Decoding
bzz -d inputfile outputfile
Description
The first form of the command line (option -e) compresses the data from file inputfile and writes the compressed data into outputfile. The second form of the command line (option -d) decompressed file inputfile and writes the output to outputfile.
Options
- -d
Decoding mode.
- -e[blocksize]
Encoding mode. The optional argument blocksize specifies the size of the input file blocks processed by the Burrows-Wheeler transform expressed in kilobytes. The default block sizes is 2048 KB. The maximal block size is 4096 KB. Specifying a larger block size usually produces higher compression ratios and increases the memory requirements of both the encoder and decoder. It is useless to specify a block size that is larger than the input file.
Algorithms
The Burrows-Wheeler transform is performed using a combination of the Karp-Miller-Rosenberg and the Bentley-Sedgewick algorithms. This is comparable to (Sadakane, DCC 98) with a slightly more flexible ranking scheme. Symbols are then ordered according to a running estimate of their occurrence frequencies. The symbol ranks are then coded using a simple fixed tree and the ZP binary adaptive coder (Bottou, DCC 98).
The Burrows-Wheeler transform is also used in the well known compressor bzip2. The originality of bzz is the use of the ZP adaptive coder. The adaptation noise can cost up to 5 percent in file size, but this penalty is usually offset by the benefits of adaptation.
Performance
The following table shows comparative results (in bits per character) on the Canterbury Corpus ( http://corpus.canterbury.ac.nz ). The very good bzz performance on the spreadsheet file excl puts the weighted average ahead of much more sophisticated compressors such as fsmx.
Compression performance | |||||||||||||
text | fax | csrc | excl | sprc | tech | poem | html | lisp | man | play | Weighted | Average | |
compress | 3.27 | 0.97 | 3.56 | 2.41 | 4.21 | 3.06 | 3.38 | 3.68 | 3.90 | 4.43 | 3.51 | 2.55 | 3.31 |
gzip -9 | 2.85 | 0.82 | 2.24 | 1.63 | 2.67 | 2.71 | 3.23 | 2.59 | 2.65 | 3.31 | 3.12 | 2.08 | 2.53 |
bzip2 -9 | 2.27 | 0.78 | 2.18 | 1.01 | 2.70 | 2.02 | 2.42 | 2.48 | 2.79 | 3.33 | 2.53 | 1.54 | 2.23 |
ppmd | 2.31 | 0.99 | 2.11 | 1.08 | 2.68 | 2.19 | 2.48 | 2.38 | 2.43 | 3.00 | 2.53 | 1.65 | 2.20 |
fsmx | 2.10 | 0.79 | 1.89 | 1.48 | 2.52 | 1.84 | 2.21 | 2.24 | 2.29 | 2.91 | 2.35 | 1.63 | 2.06 |
bzz | 2.25 | 0.76 | 2.13 | 0.78 | 2.67 | 2.00 | 2.40 | 2.52 | 2.60 | 3.19 | 2.52 | 1.44 | 2.16 |
Note that DjVu contributors have several entries in this table. Program compress was written some time ago by Joe Orost. Program ppmd is an improvement of the PPM-C method invented by Paul Howard.
Credits
Program bzz was written by Léon Bottou <leonb@users.sourceforge.net> and was then improved by Andrei Erofeev <andrew_erofeev@yahoo.com>, Bill Riemers <docbill@sourceforge.net> and many others.