triehash - Man Page

Generate a perfect hash function derived from a trie.

Synopsis

triehash [option] [input file]

Description

triehash takes a list of words in input file and generates a function and an enumeration to describe the word

Input File Format

The file consists of multiple lines of the form:

    [label ~ ] word [= value]

This maps word to value, and generates an enumeration with entries of the form:

    label = value

If label is undefined, the word will be used, the minus character will be replaced by an underscore. If value is undefined it is counted upwards from the last value.

There may also be one line of the format

    [ label ~] = value

Which defines the value to be used for non-existing keys. Note that this also changes default value for other keys, as for normal entries. So if you place

    = 0

at the beginning of the file, unknown strings map to 0, and the other strings map to values starting with 1. If label is not specified, the default is Unknown.

Options

-C.c file --code=.c file

Generate code in the given file.

-Hheader file --header=header file

Generate a header in the given file, containing a declaration of the hash function and an enumeration.

--enum-name=word

The name of the enumeration.

--function-name=word

The name of the function.

--label-prefix=word

The prefix to use for labels.

--label-uppercase

Uppercase label names when normalizing them.

--namespace=name

Put the function and enum into a namespace (C++)

--class=name

Put the function and enum into a class (C++)

--enum-class

Generate an enum class instead of an enum (C++)

--counter-name=name

Use name for a counter that is set to the latest entry in the enumeration + 1. This can be useful for defining array sizes.

--ignore-case

Ignore case for words.

--multi-byte=value

Generate code reading multiple bytes at once. The value is a string of power of twos to enable. The default value is 320 meaning that 8, 4, and single byte reads are enabled. Specify 0 to disable multi-byte completely, or add 2 if you also want to allow 2-byte reads. 2-byte reads are disabled by default because they negatively affect performance on older Intel architectures.

This generates code for both multiple bytes and single byte reads, but only enables the multiple byte reads of GNU C compatible compilers, as the following extensions are used:

Byte-aligned integers

We must be able to generate integers that are aligned to a single byte using:

    typedef uint64_t __attribute__((aligned (1))) triehash_uu64;
Byte-order

The macros __BYTE_ORDER__ and __ORDER_LITTLE_ENDIAN__ must be defined.

We forcefully disable multi-byte reads on platforms where the variable __ARM_ARCH is defined and __ARM_FEATURE_UNALIGNED is not defined, as there is a measurable overhead from emulating the unaligned reads on ARM.

--language=language

Generate a file in the specified language. Currently known are 'C' and 'tree', the latter generating a tree.

--include=header

Add the header to the include statements of the header file. The value must be surrounded by quotes or angle brackets for C code. May be specified multiple times.

License

triehash is available under the MIT/Expat license, see the source code for more information.

Author

Julian Andres Klode <jak@jak-linux.org>

Info

2024-07-20 perl v5.40.0 User Contributed Perl Documentation