bibtexu - Man Page

UTF-8 Big BibTeX

Synopsis

bibtexu [options] aux-file

Description

BibTeXu is the Unicode-compliant version of BibTeX. It is largely based on Niel Kempson's BibTeX8, and it provides a better support for UTF-8 by integrating ICU library.  Therefore, BibTeXu no longer requires the Codepage and Sort order ("CS") file; instead, the method of sorting and case-changing can be controlled via command-line options.

Options

-?  --help

display some brief help text.

-d  --debug TYPE

report debugging information.  TYPE is one or more of all, csf, io, mem, misc, search.

-s  --statistics

report internal statistics.

-t  --trace

report execution tracing.

-v  --version

report BibTeX version.

-l  --language LANG

use language LANG to convert strings to low case. This argument is passed to ICU library.

-o  --location LANG

use language LANG for sorting. This argument is passed to ICU library.

-B  --big

set large BibTeX capacity.

-H  --huge

set huge BibTeX capacity.

-W  --wolfgang

set really huge BibTeX capacity for Wolfgang.

-M  --min_crossrefs ##

set min_crossrefs to ##.

--mstrings ##

allow ## unique strings.

Unicode Support

BibTeXu supports extended features to handle Unicode characters. Several built-in functions in bibliography styles are enhanced as follows.

&

Pops the top two (integer) literals and pushes their bitwise AND.

|

Pops the top two (integer) literals and pushes their bitwise OR.

add.period$

Pops the top (string) literal, adds a `.' to it if the last non`}' character isn't a `.', `?', `!' or a Unicode punctuation mark and pushes this resulting string. The mark may be U+203C, U+203D, U+2047, U+2048, U+2049, U+3002, U+FF01, U+FF0E or U+FF1F.

chr.to.int$

Pops the top (string) literal, makes sure it's a multibyte string of a single Unicode code point, converts it to the corresponding Unicode scalar value (integer), and pushes this integer.

int.to.chr$

Pops the top (integer) literal, interpreted as the Unicode scalar value of a single code point, converts it to the corresponding single character multibyte string, and pushes this string.

num.names$,  format.name$

The function is the same as original BibTeX but an Ideographic/Fullwidth Comma (U+3001, U+FF0C) in addition to an " and " string is accepted as a separator between persons and Ideographic Space (U+3000) in addition to a space " " is accepted as a separator between a family name and a given name.

substring$,  text.length$,  text.prefix$

The function is the same as original BibTeX but the unit of operand numbers is Unicode code point.

change.case$

The function is the same as original BibTeX but letters of non-english Latin, Greek and Cyrillic are supported.

width$

The function is the same as original BibTeX but letters of Latin-1 and Latin Extended-A and CJK characters are supported.

is.cjk.str$

Pops the top (string) literal, set flag bits to an integer if CJK characters are found in the string, and pushes the resulting integer, otherwise pushes 0. Flags 0x001, 0x002, 0x004, 0x008 and 0x800 are corresponding to Hanzi (Kanji, Hanja), Kana, Hangul, Bopomofo and other CJK characters, respectively. For example, an integer 0x003 will be pushed if Hanzi and Kana characters are found in a poped string literal.

is.kanji.str$

Same as is.cjk.str$ for compatibility with (u)pBibTeX.

See Also

More detailed description of BibTeXu is available at $TEXMFDIST/doc/bibtexu/README.

Authors

BibTeXu was written by Yannis Haralambous and his students. It is maintained as part of TeX Live.

This manpage was written for TeX Live.

Info

30 August 2022 bibtexu 4.00