dbcolsdecimate - Man Page
drop rows selectively, keeping large changes and periodic samples
Synopsis
dbcolsdecimate [-p RELATIVE_PREC] [-P ABSOLUTE_PREC] column1 [column2...]
Description
For each of the given columns, prune it back to show changes with at most RELATIVE_PRECISION fraction of total range change (default: 0.01; alternativey one can specify an absolute precision). This tool is designed for reducing the actual data in a graph while keeping it visually identical.
Precisions, if specified, apply to any any subsequent columns. (One can therefore have different precisions for different columsn.)
With multiple columns, major changes in any column cause a record to be emitted.
Our goal is to output an identical plot, with fewer points if we can. This goal differs from and is easier than prior published work that has the goal of the number of points by a known factor, or to a constant number, while preserving as much fidelity as possible.
We usually put out a pair of points at each change, so that if the data has stairsteps, they don't turn in to diagonals.
Please take caution that relative precision is based on evaluation of the range of the data, and so it is sensitive to outliers. Verbose output (-v) will show the actual precision that is promised, allowing one to adjust manually if necessary (with -P).
By default this program temporarily stores a complete copy of the input data on disk. However, if all columns are given absolute precisions, this program runs with constant memory.
Options
- --precision-relative P or --relative-precision P or -p P
Set the precision of how large a fraction of the total range should be presereved. Applies to any subsequent columns. Default: 0.01.
- --precision-absolute P or --absolute-precision P or -P P
Set the precision in absolute units. Applies to any subsequent columns.
- -T TmpDir
where to put tmp files. Also uses environment variable TMPDIR, if -T is not specified. Default is /tmp.
This module also supports the standard fsdb options:
- -d
Enable debugging output.
- -v
Enable verbose output.
- -i or --input InputSource
Read from InputSource, typically a file name, or
-
for standard input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.- -o or --output OutputDestination
Write to OutputDestination, typically a file name, or
-
for standard output, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.- --autorun or --noautorun
By default, programs process automatically, but Fsdb::Filter objects in Perl do not run until you invoke the run() method. The
--(no)autorun
option controls that behavior within Perl.- --help
Show help.
- --man
Show full manual.
Sample Usage
Input
#fsdb x y 0 0 1 50 2 50 3 50 4 50 5 50 6 50 7 50 8 50 9 50 10 50 11 50 12 50 13 50 14 50 15 50 16 50 17 50 18 50 19 50 20 50 21 50 22 50 23 50 24 50 25 50 26 50 27 50 28 50 29 50 30 50 31 50 32 50 33 50 34 50 35 50 36 50 37 50 38 50 39 50 40 50 41 50 42 50 43 50 44 50 45 50 46 50 47 50 48 50 49 50 50 50 50 51 50 52 50 53 50 54 50 55 50 56 50 57 50 58 50 59 50 60 50 61 50 62 50 63 50 64 50 65 50 66 50 67 50 68 50 69 50 70 50 71 50 72 50 73 50 74 50 75 50 76 50 77 50 78 50 79 50 80 50 81 50 82 50 83 50 84 50 85 50 86 50 87 50 88 50 89 50 90 50 91 50 92 50 93 50 94 50 95 50 96 50 97 50 98 50 99 100 100
Command
dbcolsdecimate -v -p 0.1 x -p 0.2 y
Output
(from TEST/dbcolsdecimate_linear_different.out):
#fsdb x y # column x with range 100 and relative precision 0.1 gives threshold 10 # column y with range 100 and relative precision 0.2 gives threshold 20 0 0 1 50 11 50 12 50 22 50 23 50 33 50 34 50 44 50 45 50 50 70 50 71 50 91 50 92 50 99 100 100 # output 16 of 101 (0.1584) # | dbcolsdecimate -v -p 0.1 x -p 0.2 y
See Also
Fsdb, dbcolmovingstats.
AUTHOR and COPYRIGHT
Copyright (C) 2023 by John Heidemann <johnh@isi.edu>
This program is distributed under terms of the GNU general public license, version 2. See the file COPYING with the distribution for details.