Diacritical Marks: A Complete Reference
What are diacritical marks — accents, umlauts, tildes, cedillas — with language-specific guides.
Published:
Tags: diacritical marks reference, accents umlauts Unicode, diacritic characters guide
Diacritical Marks: A Complete Reference Diacritical marks are modifier glyphs attached to letters that change pronunciation, tone, or semantic meaning. From the French acute accent (é) to the Arabic fatha (◌َ), they are fundamental to accurate written representation of most of the world's languages. --- What Is a Diacritical Mark? A diacritical mark (or diacritic) is a glyph that appears above, below, through, or alongside a base character. It differs from punctuation in that it modifies a letter rather than the sentence. The word "diacritic" comes from Greek diakritikos, meaning "that separates or distinguishes." The Unicode Standard defines the complete set of combining characters and their normalization behavior, with Unicode Technical Report #15 covering normalization forms…
Frequently Asked Questions
What are diacritical marks?
Diacritical marks are glyphs added to a base letter to change its pronunciation, tone, or meaning. Common examples include the acute accent (é), the umlaut (ü), the tilde (ñ), and the cedilla (ç). They are distinct from punctuation — they attach to letters rather than standing alone.
What languages use diacritics?
Most European languages use diacritics. French uses accents (é, è, ê, ë, à, ù, î, ï, ô, œ, ç), German uses umlauts (ä, ö, ü) and the sharp s (ß), Spanish uses the tilde (ñ) and acute accent, and Scandinavian languages use ring (å), slash (ø/ł), and various accents. Arabic and Hebrew use diacritical vowel marks that appear above or below consonants.
What is the difference between NFC and NFD for diacritics?
In NFD (Decomposed) form, a character like é is stored as two code points: the base letter e (U+0065) plus a combining acute accent (U+0301). In NFC (Composed) form, é is stored as a single precomposed code point (U+00E9). The visual result is identical, but the underlying encoding differs — which matters for string length calculations, regex matching, and diacritic stripping.
How do I add diacritics to a keyboard layout?
On macOS, hold the base letter key to see a popup with diacritic variants (e.g. hold 'e' for é, è, ê, ë, ě, ē). On Windows, switch to an international keyboard layout (US International, French, Spanish) that maps Alt+key combinations to accented characters. On Linux, the Compose key lets you type multi-key sequences like Compose+a+' for á.
What are combining characters?
Combining characters are Unicode code points that attach to the preceding base character visually. They have Unicode category Mn (Mark, Non-spacing) or Mc (Mark, Spacing Combining). The combining acute accent (U+0301) turns a into á when placed after it. Stripping all Mn characters from NFD-normalized text removes diacritical marks.
All articles · theproductguy.in