uf8proc is a mapping tool for UTF-8 strings with the following features:
- decomposing and composing of strings
- replacing compatibility characters with their equivalents
- stripping of "default ignorable characters" like SOFT-HYPHEN or ZERO-WIDTH-SPACE
- folding of certain characters for string comparison (e.g. HYPHEN U+2010 and MINUS U+2212 to ASCII "-") (see "LUMP" option)
- optional rejection of strings containing non-assigned code points
- stripping of control characters
- stripping of character marks (accents, etc.)
- transformation of LF, CRLF, CR and NEL to line-feed (LF) or to the unicode chararacters for paragraph separation (PS) or line separation (LS).
- unicode case folding (for case insensitive string comparisons)
- rejection of illegal UTF-8 data (i.e. UTF-8 encoded UTF-16 surrogates)
- support for korean hangul characters
Unicode Version 7.0.0 is supported.
See utf8proc.h for the API.