About

Stripping diacritics during data migration or ASCII normalization is common. Restoring them is not. Incorrect or missing diacritical marks change meaning: résumé becomes resume, café becomes cafe, and the Polish łódź becomes unrecognizable. This tool maps each base Latin character (a - z, A - Z) to its diacritical variant using a dictionary of over 200 Unicode code points across 15 diacritic categories. You select a diacritic type or a language preset. The tool applies the transformation character-by-character. Characters without a known diacritical form pass through unchanged. The output is standard UTF-8 text safe for HTML, databases, and print. Note: this tool applies diacritics uniformly or by preset rules. It does not perform linguistic analysis and cannot determine contextual correctness within a sentence.

Formulas

The transformation applies a deterministic character-level mapping function:

output_i = map(input_i, D)

where input_i is the i-th character of the source string and D is the selected diacritic type. The mapping function is defined as:

{

dict[D][input_i] if input_i ∈ dict[D]input_i otherwise

where dict is the complete lookup table containing over 200 mappings across 15 diacritic categories. For language presets, a composite mapping is used. Each preset defines a set of character-specific diacritic rules R_lang = {(c₁, D₁), (c₂, D₂), …} that maps each character c to its language-appropriate diacritic D. The total character space is the Latin alphabet: |A| = 52 (uppercase + lowercase).

Reference Data

Diacritic Name	Symbol Example	Unicode Range	Languages	Affected Letters
Acute (´)	á é ó	U+00C1 - U+01FF	French, Spanish, Portuguese, Hungarian, Czech, Polish	a, c, e, i, l, n, o, r, s, u, y, z
Grave (`)	à è ù	U+00C0 - U+01F9	French, Italian, Portuguese, Vietnamese	a, e, i, o, u
Circumflex (^)	â ê ô	U+00C2 - U+0176	French, Portuguese, Romanian, Welsh, Vietnamese	a, c, e, g, h, i, j, o, s, u, w, y
Tilde (~)	ã ñ õ	U+00C3 - U+0169	Spanish, Portuguese, Estonian, Vietnamese	a, e, i, n, o, u, v, y
Umlaut / Diaeresis (¨)	ä ö ü	U+00C4 - U+0178	German, Turkish, Finnish, Swedish, Hungarian	a, e, i, o, u, y
Cedilla (¸)	ç ş ţ	U+00C7 - U+0163	French, Portuguese, Turkish, Romanian	c, d, e, g, h, k, l, n, r, s, t
Ring (˚)	å ů	U+00C5 - U+016F	Swedish, Danish, Norwegian, Czech	a, u
Caron / Háček (ˇ)	č š ž ř	U+010C - U+017E	Czech, Slovak, Slovenian, Croatian, Lithuanian	a, c, d, e, g, h, i, j, k, l, n, o, r, s, t, u, z
Macron (¯)	ā ē ī ō ū	U+0100 - U+0233	Latvian, Lithuanian, Maori, Japanese Romaji, Hawaiian	a, e, g, i, o, u, y
Breve (˘)	ă ĕ ğ	U+0102 - U+016D	Romanian, Turkish, Vietnamese, Esperanto	a, e, g, i, o, u
Ogonek (˛)	ą ę į ų	U+0104 - U+0173	Polish, Lithuanian, Navajo	a, e, i, o, u
Dot Above (˙)	ċ ė ġ İ ż	U+010A - U+017C	Polish, Lithuanian, Turkish, Maltese	a, b, c, d, e, f, g, h, i, m, n, o, p, r, s, t, w, x, y, z
Stroke (Đ/đ)	đ ħ ł ø ŧ	U+00D0 - U+0167	Polish, Danish, Norwegian, Vietnamese, Croatian	d, h, l, o, t
Double Acute (˝)	ő ű	U+0150 - U+0171	Hungarian	o, u
Horn (ơ/ư)	ơ ư	U+01A0 - U+01B0	Vietnamese	o, u

Frequently Asked Questions

Only base Latin letters (a - z, A - Z) that have a defined mapping for the selected diacritic type are transformed. Digits, punctuation, whitespace, and characters already carrying diacritics pass through unchanged. For example, selecting "Cedilla" only affects c, s, t, and a few others - the letter "b" has no cedilla variant in Unicode and remains as-is.

Single diacritic mode applies one mark (e.g., acute) to every eligible letter uniformly. Language presets apply mixed diacritics based on which marks that language actually uses. For example, the French preset applies cedilla to "c", circumflex to "a/e/i/o/u", and acute to "e" - because French uses all three. The Czech preset applies háček to c, s, z, r and acute to a, e, i, o, u, y.

Yes. All output uses precomposed Unicode characters (NFC normalization form), not combining sequences. This means "é" is stored as U+00E9 (a single code point), not as "e" + U+0301 (two code points). Precomposed forms are universally supported in UTF-8 databases, HTML, CSS, and all modern text rendering engines.

Contextual diacritic restoration requires a natural language processing model with a dictionary of millions of word forms per language. The word "resume" could be "résumé" (French/English) or remain "resume" (English verb). This ambiguity cannot be resolved without sentence-level NLP. The tool instead provides deterministic, predictable transformations that you control.

The tool applies one transformation at a time. However, since characters already carrying diacritics are not re-transformed, you could copy the output, paste it back, and apply a second diacritic type - the previously modified characters will pass through unchanged while new eligible characters get the second mark. Language presets handle multi-diacritic scenarios in a single pass.

Each diacritic mapping includes both uppercase and lowercase variants. Inputting "A" with acute selected produces "Á" (U+00C1), while "a" produces "á" (U+00E1). Case is always preserved through the transformation.