unicode_category_lookup - Man Page
unicode character categorization
Synopsis
#include <courier-unicode.h>
uint32_t unicode_category_lookup(char32_t c);
int unicode_isalnum(char32_t c);
int unicode_isalpha(char32_t c);
int unicode_isblank(char32_t c);
int unicode_isdigit(char32_t c);
int unicode_isgraph(char32_t c);
int unicode_islower(char32_t c);
int unicode_ispunct(char32_t c);
int unicode_isspace(char32_t c);
int unicode_isupper(char32_t c);
Description
unicode_category_lookup() looks up the unicode character's categorization[1]. unicode_category_lookup() returns a 32 bit value. The value's UNICODE_CATEGORY_1 bits specify the first level of the unicode character's category, with UNICODE_CATEGORY_2, UNICODE_CATEGORY_3, and UNICODE_CATEGORY_4 bits specifying the 2nd, 3rd, and 4th level, if given. A value of 0 for each corresponding bit set indicates that no category is specified for this level, for this character; otherwise the possible values are defined in <courier-unicode.h>.
The remaining functions implement comparable equivalents of their non-unicode versions in the standard C library, as follows:
- unicode_isalnum()
Returns non-0 for all unicode_isalpha() or unicode_isdigit().
- unicode_isalpha()
Returns non-0 for all UNICODE_CATEGORY_1_LETTER.
- unicode_isblank()
Return non-0 for TAB, and all UNICODE_CATEGORY_2_SPACE.
- unicode_isdigit()
Returns non-0 for all UNICODE_CATEGORY_1_NUMBER | UNICODE_CATEGORY_2_DIGIT, only (no third categories).
- unicode_isgraph()
Returns non-0 for all codepoints above SPACE which are not unicode_isspace().
- unicode_islower()
Returns non-0 for all unicode_isalpha() for which the character is equal to unicode_lc(3) of itself.
- unicode_ispunct()
Returns non-0 for all UNICODE_CATEGORY_1_PUNCTUATION.
- unicode_isspace()
Returns non-0 for unicode_isblank() or for unicode characters with linebreaking properties of BK, CR, LF, NL, and SP.
- unicode_isupper()
Returns non-0 for all unicode_isalpha() for which the character is equal to unicode_uc(3) of itself.
See Also
Author
Sam Varshavchik
Author
Notes
- 1.
unicode character's categorization
https://unicode.org/notes/tn36/
Referenced By
courier-unicode(7), unicode_uc(3).
The man pages unicode_isalnum(3), unicode_isalpha(3), unicode_isblank(3), unicode_isdigit(3), unicode_isgraph(3), unicode_islower(3), unicode_ispunct(3), unicode_isspace(3) and unicode_isupper(3) are aliases of unicode_category_lookup(3).