ugrep-indexer - Man Page

file indexer to accelerate recursive searching

Synopsis

ugrep-indexer [-0...9] [-c|-d|-f] [-I] [-q] [-S] [-s] [-X] [-z] [PATH]

Description

The ugrep-indexer utility recursively indexes files to accelerate recursive searching with the ug --index PATTERN commands:

$ ugrep-indexer [-I] [-z]

 ...

$ ug --index [-I] [-z] [-r|-R] OPTIONS PATTERN

$ ugrep --index [-I] [-z] [-r|-R] OPTIONS PATTERN

where option -I or --ignore-binary ignores binary files, which is recommended to limit indexing storage overhead and to reduce search time. Option -z or --decompress indexes and searches archives and compressed files.

Indexing speeds up searching file systems that are large and cold (not recently cached in RAM) and file systems that are generally slow to search.  Note that indexing may not speed up searching few files or recursively searching fast file systems.

Searching with ug --index is safe and never skips modified files that may match after indexing; the ug --index PATTERN command always searches files and directories that were added or modified after indexing. When option --stats is used with ug --index, a search report is produced showing the number of files skipped not matching any indexes and the number of files and directories that were added or modified after indexing. Note that searching with ug --index may significantly increase the start-up time when complex regex patterns are specified that contain large Unicode character classes combined with `*' or `+' repeats, which should be avoided.

ugrep-indexer stores a hidden index file in each directory indexed.  The size of an index file depends on the number of files indexed and the specified indexing accuracy.  Higher accuracy produces larger index files to improve search performance by reducing false positives (a false positive is a match prediction for a file when the file does not match the regex pattern.)

ugrep-indexer accepts an optional PATH to the root of the directory tree to index.  The default is to index the working directory tree.

ugrep-indexer incrementally updates indexes.  To force reindexing, specify option -f or --force.  Indexes are deleted with option -d or --delete.

ugrep-indexer may be stopped and restarted to continue indexing at any time.  Incomplete index files do not cause errors.

ASCII, UTF-8, UTF-16 and UTF-32 files are indexed and searched as text files unless their UTF encoding is invalid.  Files with other encodings are indexed as binary files and can be searched with non-Unicode regex patterns using ug --index -U.

When ugrep-indexer option -I or --ignore-binary is specified, binary files are ignored and not indexed.  Avoid searching these non-indexed binary files with ug --index -I using option -I.

ugrep-indexer option -X or --ignore-files respects gitignore rules.  Likewise, avoid searching non-indexed ignored files with ug --index --ignore-files using option --ignore-files.

Archives and compressed files are indexed with ugrep-indexer option -z or --decompress.  Otherwise, archives and compressed files are indexed as binary files or are ignored with option -I or --ignore-binary.  Note that once an archive or compressed file is indexed as a binary file, it will not be reindexed with option -z to index the contents of the archive or compressed file.  Only files that are modified after indexing are reindexed, which is determined by comparing time stamps.

Symlinked files are indexed with ugrep-indexer option -S or --dereference-files.  Symlinks to directories are never followed.  

To save a log file of the indexing process, specify option -v or --verbose and redirect standard output to a log file.  All messages and warnings are sent to standard output and captured by the log file.

A .ugrep-indexer configuration file with configuration options is loaded when present in the working directory or in the home directory.  A configuration option consists of the name of a long option and its argument when applicable.

The following options are available:

-0,  -1,  -2,  -3, ..., -9,  --accuracy=DIGIT

Specifies indexing accuracy.  A low accuracy reduces the indexing storage overhead at the cost of a higher rate of false positive pattern matches (more noise).  A high accuracy reduces the rate of false positive regex pattern matches (less noise) at the cost of an increased indexing storage overhead.  An accuracy between 2 and 7 is recommended.  The default accuracy is 4.

-., --hidden

Index hidden files and directories.

-?,  --help

Display a help message and exit.

-c,  --check

Recursively check and report indexes without reindexing files.

-d,  --delete

Recursively remove index files.

-f,  --force

Force reindexing of files, even those that are already indexed.

-I,  --ignore-binary

Do not index binary files.

-q,  --quiet,  --silent

Quiet mode: do not display indexing statistics.

-S,  --dereference-files

Follow symbolic links to files.  Symbolic links to directories are never followed.

-s,  --no-messages

Silent mode: nonexistent and unreadable files are ignored, i.e. their error messages and warnings are suppressed.

-V,  --version

Display version and exit.

-v,  --verbose

Produce verbose output.  Files are marked A for archive, C for compressed, and B for binary or I for ignored binary.  Deletions are marked D.

-X,  --ignore-files, --ignore-files=FILE

Do not index files and directories matching the globs in FILE encountered during indexing.  The default FILE is `.gitignore'. This option may be repeated to specify additional files.

-z,  --decompress

Index the contents of compressed files and archives.  Hidden files in archives are ignored unless option -. or --hidden is specified. Option -I or --ignore-binary ignores compressed binary files.  When used with option --zmax=NUM, indexes the contents of compressed files and archives stored within archives up to NUM levels deep. Supported compression formats: gzip (.gz), compress (.Z), zip, 7z, bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2), lzma and xz (requires suffix .lzma, .tlz, .xz, .txz), lz4 (requires suffix .lz4), zstd (requires suffix .zst, .zstd, .tzst), brotli (requires suffix .br).

--zmax=NUM

When used with option -z (--decompress), indexes the contents of compressed files and archives stored within archives by up to NUM expansion levels deep.  The default --zmax=1 only permits indexing uncompressed files stored in cpio, pax, tar, zip and 7z archives; compressed files and archives are detected as binary files and are effectively ignored.  Specify --zmax=2 to index compressed files and archives stored in cpio, pax, tar, zip and 7z archives.  NUM may range from 1 to 99 for up to 99 decompression and de-archiving steps.  Increasing NUM values gradually degrades performance.

Exit Status

The ugrep-indexer utility exits with one of the following values:

0

Indexes are up to date.

1

Indexing check -c detected missing and outdated index files.

Examples

Recursively and incrementally index all non-binary files showing progress:

$ ugrep-indexer -I -v

Recursively and incrementally index all non-binary files, including non-binary files stored in archives and in compressed files, showing progress:

$ ugrep-indexer -z -I -v

Incrementally index all non-binary files, including archives and compressed files, show progress, follow symbolic links to files (but not to directories), but do not index files and directories matching the globs in .gitignore:

$ ugrep-indexer -z -I -v -S -X

Force re-indexing of all non-binary files, including archives and compressed files, follow symbolic links to files (but not to directories), but do not index files and directories matching the globs in .gitignore:

$ ugrep-indexer -f -z -I -v -S -X

Same, but decrease index file storage to a minimum by decreasing indexing accuracy from 4 (the default) to 0:

$ ugrep-indexer -f -0 -z -I -v -S -X

Increase search performance by increasing the indexing accuracy from 4 (the default) to 7 at a cost of larger index files:

$ ugrep-indexer -f7zIvSX

Recursively delete all hidden ._UG#_Store index files to restore the directory tree to non-indexed:

$ ugrep-indexer -d

See Also

ug(1), ugrep(1).

Bugs

Report bugs at:

https://github.com/Genivia/ugrep-indexer/issues

Referenced By

ugrep(1).

November 12, 2024 ugrep-indexer 7.0.4