CharacterNormalize["text",form]
converts the characters in text to the specified normalization form.
CharacterNormalize
CharacterNormalize["text",form]
converts the characters in text to the specified normalization form.
Details
- CharacterNormalize supports the following Unicode normalization forms:
-
"NFD" canonical decomposition (Form D) "NFC" canonical decomposition, followed by canonical composition (Form C) "NFKD" compatibility decomposition (Form KD) "NFKC" compatibility decomposition, followed by canonical composition (Form KC) - In CharacterNormalize[text,…], text can be a string or a list of strings.
- In "NFD" and "NFC", canonical decomposition refers to these four type of operations:
-
Å Å,… decompose marks Ȱ Ȱ,… decompose and order marks 한 한, … decompose Hangul and conjoining Jamo Ω(Ohm) Ω (Omega),… map character to its canonical Unicode equivalent - In "NFKD" and "NFKC", compatibility decomposition refers to operations such as:
-
ℌH ,ℍH,… normalize font variants (NBSP)(Space), … normalize linebreaking differences ﻉ ع,ﻊ ع, … normalize positional variants ①1, … normalize circled variants カカ, … normalize width variants ︷{ ,︸} , … normalize rotated variants i⁹ i9,i₉ i9, … normalize subscripts/superscripts ㌀アパート, … decompose squared characters ¼ 1/4 , … normalize fractions dž→dž, … other normalizations
Examples
open all close allBasic Examples (5)
Normalize string characters using canonical decomposition:
CharacterNormalize["DŽuńglã", "NFD"]Normalize string characters using compatibility decomposition:
CharacterNormalize["DŽuńglã", "NFKD"]Normalize string characters using compatibility decomposition followed by canonical composition:
CharacterNormalize["DŽuńglã", "NFKC"]Normalize string characters using canonical decomposition followed by canonical composition:
CharacterNormalize["DŽuńglã", "NFC"]Normalize the characters in the string using compatibility decomposition:
str = CharacterNormalize["Türkçe", "NFKD"]Characters with diacritics have been decomposed:
Characters[str]Scope (2)
Decompose a composite character into its constituents:
CharacterNormalize["Ύ", "NFD"]Ordering of the mark and the character has changed after normalization:
Characters[%]Obtain the "Ohm" character from its code:
ohm = FromCharacterCode[8486]NFD maps characters to their canonically equivalent Unicode. Normalize the character using NFD:
omega = CharacterNormalize[ohm, "NFD"]Convert the output (omega) to its character code:
ToCharacterCode[omega]Generalizations & Extensions (1)
CharacterNormalize threads itself elementwise over lists:
CharacterNormalize[{"Witaj", "świecie"}, "NFD"]CharacterNormalize works on strings of different scripts and letters:
CharacterNormalize[{"Ευρώπη", "Википедий", "Української Вікіпедії", "アルファベット"}, "NFD"]Possible Issues (1)
Compatibility equivalence may convert different forms of a character to a canonical form:
CharacterNormalize[{"ﻉ", "ﻊ", "ﻋ", "ﻌ"}, "NFKC"]CharacterNormalize[{"ﻉ", "ﻊ", "ﻋ", "ﻌ"}, "NFC"]Compatibility equivalence may remove formatting distinctions that are not changed in canonical equivalent characters:
CharacterNormalize[{"i⁹", "i₉", "¼"}, "NFKC"]CharacterNormalize[{"i⁹", "i₉", "¼"}, "NFC"]Related Guides
History
Text
Wolfram Research (2020), CharacterNormalize, Wolfram Language function, https://reference.wolfram.com/language/ref/CharacterNormalize.html.
CMS
Wolfram Language. 2020. "CharacterNormalize." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/CharacterNormalize.html.
APA
Wolfram Language. (2020). CharacterNormalize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CharacterNormalize.html
BibTeX
@misc{reference.wolfram_2026_characternormalize, author="Wolfram Research", title="{CharacterNormalize}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/CharacterNormalize.html}", note=[Accessed: 12-June-2026]}
BibLaTeX
@online{reference.wolfram_2026_characternormalize, organization={Wolfram Research}, title={CharacterNormalize}, year={2020}, url={https://reference.wolfram.com/language/ref/CharacterNormalize.html}, note=[Accessed: 12-June-2026]}