triehash - Man Page
Generate a perfect hash function derived from a trie.
Synopsis
triehash [option] [input file]
Description
triehash takes a list of words in input file and generates a function and an enumeration to describe the word
Input File Format
The file consists of multiple lines of the form:
[label ~ ] word [= value]
This maps word to value, and generates an enumeration with entries of the form:
label = value
If label is undefined, the word will be used, the minus character will be replaced by an underscore. If value is undefined it is counted upwards from the last value.
There may also be one line of the format
[ label ~] = value
Which defines the value to be used for non-existing keys. Note that this also changes default value for other keys, as for normal entries. So if you place
= 0
at the beginning of the file, unknown strings map to 0, and the other strings map to values starting with 1. If label is not specified, the default is Unknown.
Options
- -C.c file --code=.c file
Generate code in the given file.
- -Hheader file --header=header file
Generate a header in the given file, containing a declaration of the hash function and an enumeration.
- --enum-name=word
The name of the enumeration.
- --function-name=word
The name of the function.
- --label-prefix=word
The prefix to use for labels.
- --label-uppercase
Uppercase label names when normalizing them.
- --namespace=name
Put the function and enum into a namespace (C++)
- --class=name
Put the function and enum into a class (C++)
- --enum-class
Generate an enum class instead of an enum (C++)
- --counter-name=name
Use name for a counter that is set to the latest entry in the enumeration + 1. This can be useful for defining array sizes.
- --ignore-case
Ignore case for words.
- --multi-byte=value
Generate code reading multiple bytes at once. The value is a string of power of twos to enable. The default value is 320 meaning that 8, 4, and single byte reads are enabled. Specify 0 to disable multi-byte completely, or add 2 if you also want to allow 2-byte reads. 2-byte reads are disabled by default because they negatively affect performance on older Intel architectures.
This generates code for both multiple bytes and single byte reads, but only enables the multiple byte reads of GNU C compatible compilers, as the following extensions are used:
- Byte-aligned integers
We must be able to generate integers that are aligned to a single byte using:
typedef uint64_t __attribute__((aligned (1))) triehash_uu64;
- Byte-order
The macros __BYTE_ORDER__ and __ORDER_LITTLE_ENDIAN__ must be defined.
We forcefully disable multi-byte reads on platforms where the variable __ARM_ARCH is defined and __ARM_FEATURE_UNALIGNED is not defined, as there is a measurable overhead from emulating the unaligned reads on ARM.
- --language=language
Generate a file in the specified language. Currently known are 'C' and 'tree', the latter generating a tree.
- --include=header
Add the header to the include statements of the header file. The value must be surrounded by quotes or angle brackets for C code. May be specified multiple times.
License
triehash is available under the MIT/Expat license, see the source code for more information.
Author
Julian Andres Klode <jak@jak-linux.org>