User Rating 0.0
Total Usage 0 times
Input Text 0 chars
Output with Diacritics 0 chars
Is this tool helpful?

Your feedback helps us improve.

About

Stripping diacritics during data migration or ASCII normalization is common. Restoring them is not. Incorrect or missing diacritical marks change meaning: résumé becomes resume, café becomes cafe, and the Polish łódź becomes unrecognizable. This tool maps each base Latin character (a - z, A - Z) to its diacritical variant using a dictionary of over 200 Unicode code points across 15 diacritic categories. You select a diacritic type or a language preset. The tool applies the transformation character-by-character. Characters without a known diacritical form pass through unchanged. The output is standard UTF-8 text safe for HTML, databases, and print. Note: this tool applies diacritics uniformly or by preset rules. It does not perform linguistic analysis and cannot determine contextual correctness within a sentence.

diacritics accents text formatting unicode special characters diacritical marks umlauts cedilla

Formulas

The transformation applies a deterministic character-level mapping function:

outputi = map(inputi, D)

where inputi is the i-th character of the source string and D is the selected diacritic type. The mapping function is defined as:

{
dict[D][inputi] if inputi dict[D]inputi otherwise

where dict is the complete lookup table containing over 200 mappings across 15 diacritic categories. For language presets, a composite mapping is used. Each preset defines a set of character-specific diacritic rules Rlang = {(c1, D1), (c2, D2), …} that maps each character c to its language-appropriate diacritic D. The total character space is the Latin alphabet: |A| = 52 (uppercase + lowercase).

Reference Data

Diacritic NameSymbol ExampleUnicode RangeLanguagesAffected Letters
Acute (´)á é óU+00C1 - U+01FFFrench, Spanish, Portuguese, Hungarian, Czech, Polisha, c, e, i, l, n, o, r, s, u, y, z
Grave (`)à è ùU+00C0 - U+01F9French, Italian, Portuguese, Vietnamesea, e, i, o, u
Circumflex (^)â ê ôU+00C2 - U+0176French, Portuguese, Romanian, Welsh, Vietnamesea, c, e, g, h, i, j, o, s, u, w, y
Tilde (~)ã ñ õU+00C3 - U+0169Spanish, Portuguese, Estonian, Vietnamesea, e, i, n, o, u, v, y
Umlaut / Diaeresis (¨)ä ö üU+00C4 - U+0178German, Turkish, Finnish, Swedish, Hungariana, e, i, o, u, y
Cedilla (¸)ç ş ţU+00C7 - U+0163French, Portuguese, Turkish, Romanianc, d, e, g, h, k, l, n, r, s, t
Ring (˚)å ůU+00C5 - U+016FSwedish, Danish, Norwegian, Czecha, u
Caron / Háček (ˇ)č š ž řU+010C - U+017ECzech, Slovak, Slovenian, Croatian, Lithuaniana, c, d, e, g, h, i, j, k, l, n, o, r, s, t, u, z
Macron (¯)ā ē ī ō ūU+0100 - U+0233Latvian, Lithuanian, Maori, Japanese Romaji, Hawaiiana, e, g, i, o, u, y
Breve (˘)ă ĕ ğU+0102 - U+016DRomanian, Turkish, Vietnamese, Esperantoa, e, g, i, o, u
Ogonek (˛)ą ę į ųU+0104 - U+0173Polish, Lithuanian, Navajoa, e, i, o, u
Dot Above (˙)ċ ė ġ İ żU+010A - U+017CPolish, Lithuanian, Turkish, Maltesea, b, c, d, e, f, g, h, i, m, n, o, p, r, s, t, w, x, y, z
Stroke (Đ/đ)đ ħ ł ø ŧU+00D0 - U+0167Polish, Danish, Norwegian, Vietnamese, Croatiand, h, l, o, t
Double Acute (˝)ő űU+0150 - U+0171Hungariano, u
Horn (ơ/ư)ơ ưU+01A0 - U+01B0Vietnameseo, u

Frequently Asked Questions

Only base Latin letters (a - z, A - Z) that have a defined mapping for the selected diacritic type are transformed. Digits, punctuation, whitespace, and characters already carrying diacritics pass through unchanged. For example, selecting "Cedilla" only affects c, s, t, and a few others - the letter "b" has no cedilla variant in Unicode and remains as-is.
Single diacritic mode applies one mark (e.g., acute) to every eligible letter uniformly. Language presets apply mixed diacritics based on which marks that language actually uses. For example, the French preset applies cedilla to "c", circumflex to "a/e/i/o/u", and acute to "e" - because French uses all three. The Czech preset applies háček to c, s, z, r and acute to a, e, i, o, u, y.
Yes. All output uses precomposed Unicode characters (NFC normalization form), not combining sequences. This means "é" is stored as U+00E9 (a single code point), not as "e" + U+0301 (two code points). Precomposed forms are universally supported in UTF-8 databases, HTML, CSS, and all modern text rendering engines.
Contextual diacritic restoration requires a natural language processing model with a dictionary of millions of word forms per language. The word "resume" could be "résumé" (French/English) or remain "resume" (English verb). This ambiguity cannot be resolved without sentence-level NLP. The tool instead provides deterministic, predictable transformations that you control.
The tool applies one transformation at a time. However, since characters already carrying diacritics are not re-transformed, you could copy the output, paste it back, and apply a second diacritic type - the previously modified characters will pass through unchanged while new eligible characters get the second mark. Language presets handle multi-diacritic scenarios in a single pass.
Each diacritic mapping includes both uppercase and lowercase variants. Inputting "A" with acute selected produces "Á" (U+00C1), while "a" produces "á" (U+00E1). Case is always preserved through the transformation.