About

Mistyped diacritical marks corrupt data silently. A name like Müller indexed as Muller fails database lookups, breaks sorting algorithms, and can invalidate legal documents. This tool provides the full Unicode Latin Extended set - over 300 precomposed characters covering acute (é), grave (è), circumflex (ê), tilde (ñ), umlaut (ü), cedilla (ç), caron (š), and other diacritics used across European, Turkic, and Vietnamese orthographies. Characters are precomposed (NFC normalized), not combining sequences, so they render consistently across systems.

Pro tip: precomposed characters (e.g., U+00E9 é) are safer for filenames, URLs, and databases than combining sequences (U+0065 + U+0301). This tool outputs only precomposed forms. Note: some rare diacritical combinations have no precomposed Unicode codepoint and require combining marks - those cases are outside this tool's scope.

Formulas

This tool performs direct Unicode character insertion, not mathematical transformation. The core logic maps a base letter to its precomposed diacritical variants using a lookup dictionary.

lookup(base) → { c₁, c₂, …, c_n }

where base ∈ A - Z and each c_i is a precomposed NFC codepoint. Text insertion uses cursor-position slicing:

result = text[0..cursor] + c_i + text[cursor..end]

where cursor is the current selectionStart index. Clipboard operations use the async navigator.clipboard.writeText API with document.execCommand("copy") as fallback for older browsers.

Reference Data

Diacritic Name	Symbol	Example	Languages	Unicode Block
Acute	´	é, á	French, Spanish, Portuguese, Hungarian, Czech	Latin-1 Supplement
Grave	`	è, à	French, Italian, Portuguese, Catalan	Latin-1 Supplement
Circumflex	^	ê, â	French, Portuguese, Romanian, Vietnamese	Latin-1 Supplement
Tilde	~	ñ, ã	Spanish, Portuguese, Estonian, Vietnamese	Latin-1 Supplement
Umlaut / Diaeresis	¨	ü, ö	German, Swedish, Finnish, Turkish, Hungarian	Latin-1 Supplement
Cedilla	¸	ç, ş	French, Portuguese, Turkish, Catalan	Latin-1 Supplement / Extended-A
Caron / Háček	ˇ	š, č	Czech, Slovak, Slovenian, Croatian, Lithuanian	Latin Extended-A
Ring Above	˚	å, ů	Swedish, Norwegian, Danish, Czech	Latin-1 Supplement / Extended-A
Ogonek	˛	ą, ę	Polish, Lithuanian, Navajo	Latin Extended-A
Macron	¯	ā, ō	Latvian, Māori, Japanese Rōmaji, Hawaiian	Latin Extended-A
Breve	˘	ă, ğ	Romanian, Turkish, Vietnamese	Latin Extended-A
Dot Above	˙	ż, ġ	Polish, Lithuanian, Maltese, Turkish	Latin Extended-A
Double Acute	˝	ő, ű	Hungarian	Latin Extended-A
Stroke / Bar	/	ø, đ	Danish, Norwegian, Vietnamese, Sami	Latin-1 Supplement / Extended-A
Horn	˛	ơ, ư	Vietnamese	Latin Extended-B
Eth	-	ð, Ð	Icelandic, Faroese, Old English	Latin-1 Supplement
Thorn	-	þ, Þ	Icelandic, Old English	Latin-1 Supplement
Eszett / Sharp S	-	ß, ẞ	German	Latin-1 Supplement / Extended Additional
Ligature AE	-	æ, Æ	Danish, Norwegian, Icelandic, Old English	Latin-1 Supplement
Ligature OE	-	œ, Œ	French	Latin Extended-A

Frequently Asked Questions

A precomposed character like é (U+00E9) is a single codepoint. A combining sequence uses two codepoints: the base letter e (U+0065) followed by combining acute accent (U+0301). Both may render identically, but precomposed forms are safer for string comparison, database storage, filenames, and URL slugs. This tool outputs only precomposed (NFC) characters.

Your file or database column is likely encoded in ASCII or Latin-1 (ISO 8859-1) rather than UTF-8. Characters outside the encoding's range get replaced with ? or mojibake (e.g., é appearing as Ã©). Ensure your storage uses UTF-8 encoding. In MySQL, use utf8mb4 charset - the older utf8 only covers the Basic Multilingual Plane.

Yes. Vietnamese requires stacking of tone marks and vowel diacritics. This tool includes precomposed Vietnamese characters such as ơ (o-horn), ư (u-horn), and combinations with acute, grave, hook above, tilde, and dot below from the Latin Extended Additional block (U+1EA0 - U+1EF9).

On Windows, use Alt codes (e.g., Alt+0233 for é) or enable the US-International keyboard layout where ' + e produces é. On macOS, press Option+E then E for é. On Linux, use Compose key sequences. All methods require memorization per character. This tool eliminates that overhead by providing visual browsing.

Yes. Most CMS systems strip or transliterate accented characters in URLs (e.g., é becomes e). However, in page content and metadata, using correct diacritics improves relevance for queries in French, Spanish, German, etc. Google treats cafe and café as related but distinct terms. Use accented forms in body text and titles; use transliterated ASCII in URLs.

Unicode defines four normalization forms. NFC (Canonical Decomposition + Canonical Composition) produces precomposed characters where possible. NFD decomposes them into base + combining marks. Two strings that look identical can fail equality checks if one is NFC and the other NFD. Database indexes, password hashing, and digital signatures can all break. This tool outputs NFC characters to minimize such issues.