Text Accent Remover - Strip Diacritics & Special Characters Online
Remove accents, diacritics, and special characters from text instantly. Converts é to e, ñ to n, ü to u using Unicode NFD normalization.
About
Accented characters cause silent failures in systems that expect ASCII input. Database collation mismatches, broken URL slugs, failed search queries, and CSV import errors all trace back to unhandled diacritical marks. This tool applies Unicode NFD (Normalization Form Canonical Decomposition) to decompose composite characters like é into base letter e plus combining acute accent U+0301, then strips the combining marks. It also handles non-decomposable special letters - ø, đ, ł, ß - via direct mapping tables, which NFD alone cannot resolve.
The tool processes text in O(n) time where n is the string length. It preserves whitespace, punctuation, numbers, and non-Latin scripts (CJK, Arabic, Cyrillic) untouched. Limitation: transliteration of full non-Latin alphabets (e.g., Greek α → a) is not performed - only combining diacritical marks in the Unicode range U+0300 - U+036F are removed. Pro tip: always test your output against your target system's character whitelist before bulk processing.
Formulas
The accent removal process follows a two-stage algorithm. Stage 1 applies Unicode Normalization Form D. Stage 2 strips combining diacritical marks and maps special characters.
Where NFD(s) decomposes each composite character into its base character plus combining marks:
The stripMarks function applies a regular expression to remove all Unicode combining diacritical marks:
The mapSpecial function handles characters that NFD cannot decompose - ligatures and modified letters with no combining mark equivalent:
Where s is the input string. The algorithm runs in O(n) time complexity where n is the character count. The Unicode combining diacritical marks block spans codepoints U+0300 through U+036F, covering 112 combining marks including accents, cedillas, ogonek, horn, and various other modifications.
Reference Data
| Accented Character | Unicode Codepoint | NFD Decomposition | Result After Stripping | Language Origin |
|---|---|---|---|---|
| é | U+00E9 | e + U+0301 (acute) | e | French, Portuguese, Spanish |
| ñ | U+00F1 | n + U+0303 (tilde) | n | Spanish |
| ü | U+00FC | u + U+0308 (diaeresis) | u | German, Turkish |
| ç | U+00E7 | c + U+0327 (cedilla) | c | French, Portuguese, Turkish |
| ö | U+00F6 | o + U+0308 (diaeresis) | o | German, Swedish, Finnish |
| à | U+00E0 | a + U+0300 (grave) | a | French, Italian, Portuguese |
| â | U+00E2 | a + U+0302 (circumflex) | a | French, Romanian |
| ž | U+017E | z + U+030C (caron) | z | Czech, Slovak, Slovenian |
| ø | U+00F8 | Non-decomposable (special map) | o | Danish, Norwegian |
| ł | U+0142 | Non-decomposable (special map) | l | Polish |
| đ | U+0111 | Non-decomposable (special map) | d | Croatian, Vietnamese |
| ß | U+00DF | Non-decomposable (special map) | ss | German |
| å | U+00E5 | a + U+030A (ring above) | a | Swedish, Norwegian, Danish |
| ă | U+0103 | a + U+0306 (breve) | a | Romanian, Vietnamese |
| ț | U+021B | t + U+0326 (comma below) | t | Romanian |
| ś | U+015B | s + U+0301 (acute) | s | Polish |
| ī | U+012B | i + U+0304 (macron) | i | Latvian, Latin transliteration |
| ğ | U+011F | g + U+0306 (breve) | g | Turkish |
| ê | U+00EA | e + U+0302 (circumflex) | e | French, Portuguese |
| ï | U+00EF | i + U+0308 (diaeresis) | i | French, Catalan |
| ô | U+00F4 | o + U+0302 (circumflex) | o | French, Portuguese, Slovak |
| ū | U+016B | u + U+0304 (macron) | u | Latvian, Japanese romaji |
| ý | U+00FD | y + U+0301 (acute) | y | Czech, Icelandic |
| ð | U+00F0 | Non-decomposable (special map) | d | Icelandic, Old English |
| þ | U+00FE | Non-decomposable (special map) | th | Icelandic, Old English |