statist - Man Page
calculate Huffman distribution for freeze(1)
Synopsis
statist [ -gx... ]
Description
The default table is tuned for both C texts and executable files (as in LHARC). If you will freeze any other files (natural language texts, databases, images, fonts, etc.) you can calculate the matching positions distribution using the `statist' program, which calculates and displays the mentioned distribution for the given file. It is useful for large (100K or more) files.
Though the built-in position table is polyvalent, the tuning can increase the compression rate up to one additional percent. (Observed mainly on text files.)
Usage
statist [-g...] < sample_file or gensample | statist [-g...]
where `gensample' is a program generating some sample stream of bytes similar to files to be frozen.
The -g and -x switches have the same meaning as for freeze(1) and may be repeated.
You can also see the intermediate values and watch their changes by pressing INTR key when you wish.
Note: If you use gensample | statist , remember that INTR influence BOTH processes !!
The results have the following format:
n1 n2 n3 n4 n5 n6 n7 n8 (uncertainty = x)
Average match length: xx.yy
Percentile 99.9: p999
Percentile 99.5: p995
Percentile 99.0: p990
Percentile 97.0: p970
Percentile 95.0: p950
Percentile 90.0: p900
Percentile 80.0: p800
Percentile 70.0: p700
Percentile 50.0: p500
Sigma: xx.yy
Here n1 - n8 are values of the calculated position table elements, uncertainty is a number which denotes validity of given results (non-zero values of uncertainty indicate that the results may be unusable). Other values (average match length, percentiles and sigma) are FYI only.
You may create the /etc/default/freeze file (if you don't like /etc/default/ directory, choose another - in MS-DOS it is FREEZE.CNF in the directory of FREEZE.EXE), which has the following format: name = n1 n2 n3 n4 n5 n6 n7 n8 (name must start in column 1). For example:
---------- cut here -----------
# This is freeze's defaults file
russian=0 0 1 2 6 20 31 2 # The sample was mailx.lp (Russian)
english=0 0 1 2 7 16 36 0 # The sample was gcc.lp (English)
# End of file
---------- cut here -----------
If you find values, which are better THAN DEFAULT both for text (C programs) and binary (executable) files, please send them to me.
Important note: statist.c is NOT a part of freeze package, it is an aditional feature.
See Also
Diagnostics
Huffman tree has more than 8 levels, reducing... Self-explanatory, but sometimes reducing falls into infinite loop. xxxK Progress indicator is written after each 4K of a file processed.
Bugs
Sometimes use of the results with uncertainty = 1 (on a file) gives compression rate worse than default but use of the results with uncertainty = 13 (on other file) works quite good.
Found bugs descriptions, incompatibilities, etc. please send to leo@s514.ipmce.su.