ugrep-indexer - Man Page
file indexer to accelerate recursive searching
Synopsis
ugrep-indexer [-0...9] [-c|-d|-f] [-I] [-q] [-S] [-s] [-X] [-z] [PATH]
Description
The ugrep-indexer utility recursively indexes files to accelerate recursive searching with the ug --index PATTERN commands:
...
$ ug --index [-I] [-z] [-r|-R] OPTIONS PATTERN
$ ugrep --index [-I] [-z] [-r|-R] OPTIONS PATTERN
where option -I or --ignore-binary ignores binary files, which is recommended to limit indexing storage overhead and to reduce search time. Option -z or --decompress indexes and searches archives and compressed files.
Indexing speeds up searching file systems that are large and cold (not recently cached in RAM) and file systems that are generally slow to search. Note that indexing may not speed up searching few files or recursively searching fast file systems.
Searching with ug --index is safe and never skips modified files that may match after indexing; the ug --index PATTERN command always searches files and directories that were added or modified after indexing. When option --stats is used with ug --index, a search report is produced showing the number of files skipped not matching any indexes and the number of files and directories that were added or modified after indexing. Note that searching with ug --index may significantly increase the start-up time when complex regex patterns are specified that contain large Unicode character classes combined with `*' or `+' repeats, which should be avoided.
ugrep-indexer stores a hidden index file in each directory indexed. The size of an index file depends on the number of files indexed and the specified indexing accuracy. Higher accuracy produces larger index files to improve search performance by reducing false positives (a false positive is a match prediction for a file when the file does not match the regex pattern.)
ugrep-indexer accepts an optional PATH to the root of the directory tree to index. The default is to index the working directory tree.
ugrep-indexer incrementally updates indexes. To force reindexing, specify option -f or --force. Indexes are deleted with option -d or --delete.
ugrep-indexer may be stopped and restarted to continue indexing at any time. Incomplete index files do not cause errors.
ASCII, UTF-8, UTF-16 and UTF-32 files are indexed and searched as text files unless their UTF encoding is invalid. Files with other encodings are indexed as binary files and can be searched with non-Unicode regex patterns using ug --index -U.
When ugrep-indexer option -I or --ignore-binary is specified, binary files are ignored and not indexed. Avoid searching these non-indexed binary files with ug --index -I using option -I.
ugrep-indexer option -X or --ignore-files respects gitignore rules. Likewise, avoid searching non-indexed ignored files with ug --index --ignore-files using option --ignore-files.
Archives and compressed files are indexed with ugrep-indexer option -z or --decompress. Otherwise, archives and compressed files are indexed as binary files or are ignored with option -I or --ignore-binary. Note that once an archive or compressed file is indexed as a binary file, it will not be reindexed with option -z to index the contents of the archive or compressed file. Only files that are modified after indexing are reindexed, which is determined by comparing time stamps.
Symlinked files are indexed with ugrep-indexer option -S or --dereference-files. Symlinks to directories are never followed.
To save a log file of the indexing process, specify option -v or --verbose and redirect standard output to a log file. All messages and warnings are sent to standard output and captured by the log file.
A .ugrep-indexer configuration file with configuration options is loaded when present in the working directory or in the home directory. A configuration option consists of the name of a long option and its argument when applicable.
The following options are available:
- -0, -1, -2, -3, ..., -9, --accuracy=DIGIT
Specifies indexing accuracy. A low accuracy reduces the indexing storage overhead at the cost of a higher rate of false positive pattern matches (more noise). A high accuracy reduces the rate of false positive regex pattern matches (less noise) at the cost of an increased indexing storage overhead. An accuracy between 2 and 7 is recommended. The default accuracy is 4.
- -., --hidden
Index hidden files and directories.
- -?, --help
Display a help message and exit.
- -c, --check
Recursively check and report indexes without reindexing files.
- -d, --delete
Recursively remove index files.
- -f, --force
Force reindexing of files, even those that are already indexed.
- -I, --ignore-binary
Do not index binary files.
- -q, --quiet, --silent
Quiet mode: do not display indexing statistics.
- -S, --dereference-files
Follow symbolic links to files. Symbolic links to directories are never followed.
- -s, --no-messages
Silent mode: nonexistent and unreadable files are ignored, i.e. their error messages and warnings are suppressed.
- -V, --version
Display version and exit.
- -v, --verbose
Produce verbose output. Files are marked A for archive, C for compressed, and B for binary or I for ignored binary. Deletions are marked D.
- -X, --ignore-files, --ignore-files=FILE
Do not index files and directories matching the globs in FILE encountered during indexing. The default FILE is `.gitignore'. This option may be repeated to specify additional files.
- -z, --decompress
Index the contents of compressed files and archives. Hidden files in archives are ignored unless option -. or --hidden is specified. Option -I or --ignore-binary ignores compressed binary files. When used with option --zmax=NUM, indexes the contents of compressed files and archives stored within archives up to NUM levels deep. Supported compression formats: gzip (.gz), compress (.Z), zip, 7z, bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2), lzma and xz (requires suffix .lzma, .tlz, .xz, .txz), lz4 (requires suffix .lz4), zstd (requires suffix .zst, .zstd, .tzst), brotli (requires suffix .br).
- --zmax=NUM
When used with option -z (--decompress), indexes the contents of compressed files and archives stored within archives by up to NUM expansion levels deep. The default --zmax=1 only permits indexing uncompressed files stored in cpio, pax, tar, zip and 7z archives; compressed files and archives are detected as binary files and are effectively ignored. Specify --zmax=2 to index compressed files and archives stored in cpio, pax, tar, zip and 7z archives. NUM may range from 1 to 99 for up to 99 decompression and de-archiving steps. Increasing NUM values gradually degrades performance.
Exit Status
The ugrep-indexer utility exits with one of the following values:
- 0
Indexes are up to date.
- 1
Indexing check -c detected missing and outdated index files.
Examples
Recursively and incrementally index all non-binary files showing progress:
Recursively and incrementally index all non-binary files, including non-binary files stored in archives and in compressed files, showing progress:
Incrementally index all non-binary files, including archives and compressed files, show progress, follow symbolic links to files (but not to directories), but do not index files and directories matching the globs in .gitignore:
$ ugrep-indexer -z -I -v -S -X
Force re-indexing of all non-binary files, including archives and compressed files, follow symbolic links to files (but not to directories), but do not index files and directories matching the globs in .gitignore:
$ ugrep-indexer -f -z -I -v -S -X
Same, but decrease index file storage to a minimum by decreasing indexing accuracy from 4 (the default) to 0:
$ ugrep-indexer -f -0 -z -I -v -S -X
Increase search performance by increasing the indexing accuracy from 4 (the default) to 7 at a cost of larger index files:
$ ugrep-indexer -f7zIvSX
Recursively delete all hidden ._UG#_Store index files to restore the directory tree to non-indexed:
$ ugrep-indexer -d
Copyright
Copyright (c) 2021-2024 Robert A. van Engelen <engelen@acm.org>
ugrep-indexer is released under the BSD-3 license. All parts of the software have reasonable copyright terms permitting free redistribution. This includes the ability to reuse all or parts of the ugrep source tree.
See Also
Bugs
Report bugs at: