ddr_lzo - Man Page
Data de/compression plugin for dd_rescue
Synopsis
-L lzo[=option[:option[:...]]]
or
-L /path/to/libddr_lzo.so[=option[:option[:...]]]
Description
About
LZO is an algorithm that de/compresses data. It is tuned for speed (especially decompression speed) and trades the size of the compressed file for it to some degree. There are variants with slow compression (yet still very fast decompression) available though. See the algorithm parameter below.
This plugin has been written for dd_rescue and uses the plugin interface from it. See the dd_rescue(1) man page for more information on dd_rescue.
Options
Options are passed using dd_rescue option passing syntax: The name of the plugin (lzo) is optionally followed by an equal sign (=) and options are separated by a colon (:). the lzo plugin also allows for most options to be abbreviated to five or six letters. See the Examples section below.
Compression or decompression
The lzo dd_rescue plugin (subsequently referred to as just ddr_lzo which reflects the variable parts of the filename libddr_lzo.so) choses compression or decompression mode automatically if one of the input/output files has an [lt]zo suffix; otherwise you may specify compr[ess] or decom[press] parameters on the command line.
The parameter opt[imize] will tell ddr_lzo to do an optimization pass after compression. This might speed up decompression by a few percent when creating compressed data with high compression levels and large block sizes.
The plugin also supports the parameter bench[mark] ; if it's specified, it will output some information about CPU usage and resulting compression or decompression bandwidth. (For small files, the numbers become meaningless due to jitter and limited time resolution -- ddr_lzo will skip the output if the numbers are very tiny.)
De/compression algorithm
The lzo plugin supports a number of the (de)compression algorithms from liblzo2. You can specify which one you want to use by passing algo=XXX , where XXX can be lzo1x_1, lzo1x_1_15, lzo1x_999, lzo1x_1_11, lzo1x_1_12, lzo1y_1, lzo1y_999, lzo1f_1, lzo1f_999, lzo1b_1 ... lzo1b_9, lzo1b_99, lzo1b_999, lzo2a_999. Pass algo=help to get a list of available algorithms. Consult the liblzo documentation for more information on the algorithms. Note that only the first three are supported by lzop (it can decompress the first five though, as they're all handled by the same decompression routine).
The default (lzo1x_1) is a good choice for fast compression and very fast decompression and ensures compatibility with lzop. For higher compression you might want to chose lzo1x_999, which is very slow but lzop compatible or lzo2a_999, which is twice as fast, but not compatible with lzop.
Debugging
The debug flag will cause the ddr_lzo to output information about blocks and other internal data. It's meant for debugging purposes.
Finally there is also a flags=XXXX parameter. This sets the flags field in the header (default is 0x03000403) and is used for testing only. It is not sanity checked and you can easily set values that will break decompression or cause ddr_lzo to abort. Really only use for development purposes when you know meaning of the various bits.
Error recovery
On compression, when input bytes can't be read, ddr_lzo will encode holes in the compressed output file -- these will be skipped over on decompression.
On decompression, erroneous blocks can be detected by the checksums (most often) or by the decompressor. The lzo plugin tries to continue in that case if the block header that specifies de/compressed lengths is intact. It will then result in a block being skipped over (hole) and the decompression will be continued with the next block. This avoids corrupt data to end up in the output file (or preexisting, potentially good data there being overwritten).
The behaviour can be modified by specifying the nodisc[ard] option. When given, the decompressor's output (filled up with zeros if too short for the block) will be written to the output file. Even if we know that the data is incorrect, with some luck, parts of the block may actually be valid.
When the block headers are corrupt, your situation is desperate, as you will have lost the remainder of the file. To recover pieces after such a block header corruption, ddr_lzo supports the search option. With it, the plugin will search the input file (starting from the position given in dd_rescue with -s) for data that looks like a block header and if a valid looking header is found, it will start decompressing from that position. (If you can't find the data you look for, you might actually study the output generated with the debug flag.)
Supported dd_rescue features
dd_rescue supports appending to files with the -x/--extend option. If ddr_lzo is loaded and the output file is an existing .lzo file, the new data will be appended in the format specified by the existing LZOP header. If the header does not indicate a multipart (archive) file, the EOF marker will be overwritten, so that a valid .lzo file is created. Otherwise a new part will be appended.
When dd_rescue can't read data or a sizable amount of zero-filled data is found and the -a/--sparse option is active, then dd_rescue will create sparse files (files with holes inside). This is an optimization to save space -- the holes are interpreted as zeroes again on normal reads, so this is transparent. The holes also can be useful to ensure that good data is not overwritten with zeroes when data couldn't be read.
When the lzo module gets fed holes in compression mode, it will encode them in the compressed output file in a special way (using lzop multipart feature, as lzop unfortunately chokes on blocks with 0 compressed length). On decompression, the holes will result in the data being jumped over again (creating a hole in the output file, if no data preexists at the location).
lzop compatibility
The plugin uses the lzo1x_1 algorithm by default (just like lzop does by default) and generates adler32 checksums to allow detecting data corruption. The compressed files are compatible with lzop and ddr_lzo should handle files generated by lzop.
Multipart (archive) files from lzop are decompressed to ONE output file in the order they are stored.
Multipart files created by the lzo plugin to encode holes will be extracted to several files from lzop. The holes are encoded in the filenames (with a sequence number and the hole size up to 1TB; use the timestamp for huge holes), so a proper assembly of the fragments is possible even without ddr_lzo.
lzop only supports the lzo1x_ family of algorithms. If you chose another algorithm to compress data with ddr_lzo, it will set the needed_version_to_extract field in the resulting lzop file to ddr_lzo's own version (1.789) to indicate incompatibility with lzop (as of 1.03).
lzop by default uses block sizes of 256kiB (on Unix systems), but supports de/compression with smaller block sizes as well. It needs to be recompiled to support block sizes up to a possible maximum of 64MiB. Thus staying below or at 256kiB is recommended; even when lzop compatibility is no concern, blocks larger than 16MiB are not recommended, see below.
Blocksize considerations
When decompressing, the (soft) block size chosen in dd_rescue must be sufficient (at least half the size of the blocksize used when compressing); if you chose too small blocks, ddr_lzo will warn and exit.
For compression, the chosen (soft)blocksize in dd_rescue will determine the size of blocks to be fed to the lzo??_?_compress() routines. Larger block sizes will typically result in slightly better compression ratios, though the returns on increasing the block size quickly diminish after 64k.
The default from dd_rescue (128kiB) is a good choice. It is NOT recommended to increase the block size too much -- when an lzo file gets corrupted, at least one block will be lost; larger blocks result in larger damage. Also, blocks larger than 16MiB will not work well with the error tolerance features of ddr_lzo. Also note that blocks larger than 256kiB need recompilation of lzop if you want to be able to use lzop to process the .lzo files; blocks larger than 64MiB prevent decompression even with a recompiled lzop.
Bugs/Limitations
Maturity
The plugin is new as of dd_rescue 1.43. Do not yet rely on data saved with ddr_lzo as the only backup for valuable data. Also expect some changes to ddr_lzo in the not too distant future. (This should not break the file format, as we're following lzop ....)
Compressed data is more sensitive to data corruption than plain data. Note that the checksums (adler32 or crc32) in the lzop file format do NOT allow to correct for errors; they just allow a somewhat reliable detection of data corruption. (Ideally, a 32bit checksum just misses 1 out of 2^32 corruptions; on small changes, crc32 comes a bit closer to the ideal than adler32. You may pass the crc32 option to use crc32 instead of adler32 checksums at the expense of some speed -- unfortunately the crc32 polynomial for lzop/gzip/... is not the crc32c polynomial that has hardware support on many CPUs these days.) Also note that the checksums are NOT cryptographic hashes; a malicious attacker can easily find modifications of data that do not alter the checksums. Use MD5 or better SHA-256/SHA-512 for ensuring integrity against attackers. Use par2 or similar software to create error correcting codes (Reed-Solomon / Erasure Codes) if you want to be able to recover data in face of corruption.
Security
While care has been applied to check the result of memory allocations ..., the decompressor code has not been audited and only limited fuzzing has been applied to ensure it's not vulnerable to malicious data -- be careful when you process data from untrusted sources.
Examples
- dd_rescue -ptAL lzo=algo=lzo1x_1_15:compress,hash=alg=sha256 infile outfile
compresses data from infile into outfile using the algorithm lzo1x_1_15 and calculates the sha256 hash value of outfile. outfile will have time stamp and access rights copied over from infile and it will be emptied before (if the file happens to exist). The output file won't have encoded holes; errors in the infile will result in zeros.
- dd_rescue -aL MD5,lzo=compr:bench,MD5,lzo=decompress,MD5 infile infile2
will copy infile to infile2 compressing the data and decompressing it again on the fly. It will output MD5 hashes for the compressed data as well (though it's not stored) and for the two infiles -- the output should be identical, obviously. This command is rather artificial, used for testing. The -a flag makes dd_rescue detect zero blocks and create holes, thus testing hole encoding (sparse files) and decoding as well if the infile has sizable regions filled with zeros.
- dd_rescue -s1M -S0 -L lzo=search,nodiscard infile.lzo outfile
will search for a lzop block header in infile.lzo starting at position 1MiB into the file and decompress the remainder of the file. On finding corrupted blocks, it will still write the output from the decompressor to outfile.
See Also
dd_rescue(1) liblzo2 documentation lzop(1)
Author
Kurt Garloff <kurt@garloff.de>
Credits
The liblzo2 library and algorithm has been written by Markus Oberhumer.
http://www.oberhumer.com/opensource/lzo/
Copyright
This plugin is under the same license as dd_rescue: The GNU General Public License (GPL) v2 or v3 - at your option.
History
ddr_lzo plugin was first introduced with dd_rescue 1.43 (May 2014).
Some additional information can be found on
http://garloff.de/kurt/linux/ddrescue/