Non-Latin to HTML Entity Converter
Convert non-Latin and special characters to HTML entities, CSS hex codes, JS Unicode escapes, and URL encoding. Supports full Unicode.
About
Incorrect character encoding causes mojibake - garbled text that destroys user trust and breaks search engine indexing. Any non-ASCII character (codepoint above U+007F) risks corruption when transmitted through systems that assume 7-bit ASCII or mismatched codepages. This tool converts every non-Latin character in your text to its precise numeric or named entity representation: HTML decimal (ä), HTML hexadecimal (ä), named HTML (ä), CSS hex (\00E4), JavaScript Unicode escape (\u00E4), or URL percent-encoding (%C3%A4). It handles the full Unicode range including astral plane characters above U+FFFF using surrogate-aware codepoint iteration.
The converter processes characters by comparing each codepoint against the ASCII boundary at 0x7F. Characters at or below this threshold pass through unchanged in "non-Latin only" mode. This tool approximates encoding needs assuming UTF-8 source text. It does not handle legacy multi-byte encodings like Shift_JIS or Big5 - pre-convert those to UTF-8 first. Pro tip: named HTML entities (like ä) improve source readability but are limited to roughly 250 characters defined in the HTML specification. Numeric entities cover every Unicode codepoint without exception.
Formulas
Each character in the input string is examined by its Unicode codepoint cp, obtained via codePointAt. The ASCII boundary is defined at codepoint 0x7F (127 decimal). In "non-Latin only" mode, characters satisfying cp ≤ 0x7F pass through unchanged.
The encoding functions per format are:
Where cp = Unicode codepoint (integer). toHex(cp) converts to uppercase hexadecimal string. padHex(cp, n) zero-pads the hex to at least n digits. For HTML Named entities, a dictionary lookup maps cp → entityName. If no named entity exists, the converter falls back to HTML Hex format.
Reference Data
| Character | Name | Codepoint | HTML Decimal | HTML Hex | HTML Named | CSS Hex | JS Escape |
|---|---|---|---|---|---|---|---|
| ä | Latin Small A with Diaeresis | U+00E4 | ä | ä | ä | \00E4 | \u00E4 |
| ö | Latin Small O with Diaeresis | U+00F6 | ö | ö | ö | \00F6 | \u00F6 |
| ü | Latin Small U with Diaeresis | U+00FC | ü | ü | ü | \00FC | \u00FC |
| ß | Latin Small Sharp S | U+00DF | ß | ß | ß | \00DF | \u00DF |
| é | Latin Small E with Acute | U+00E9 | é | é | é | \00E9 | \u00E9 |
| ñ | Latin Small N with Tilde | U+00F1 | ñ | ñ | ñ | \00F1 | \u00F1 |
| © | Copyright Sign | U+00A9 | © | © | © | \00A9 | \u00A9 |
| € | Euro Sign | U+20AC | € | € | € | \20AC | \u20AC |
| £ | Pound Sign | U+00A3 | £ | £ | £ | \00A3 | \u00A3 |
| ¥ | Yen Sign | U+00A5 | ¥ | ¥ | ¥ | \00A5 | \u00A5 |
| 中 | CJK Unified - Middle | U+4E2D | 中 | 中 | - | \4E2D | \u4E2D |
| 日 | CJK Unified - Sun/Day | U+65E5 | 日 | 日 | - | \65E5 | \u65E5 |
| Д | Cyrillic Capital De | U+0414 | Д | Д | - | \0414 | \u0414 |
| я | Cyrillic Small Ya | U+044F | я | я | - | \044F | \u044F |
| α | Greek Small Alpha | U+03B1 | α | α | α | \03B1 | \u03B1 |
| π | Greek Small Pi | U+03C0 | π | π | π | \03C0 | \u03C0 |
| → | Rightwards Arrow | U+2192 | → | → | → | \2192 | \u2192 |
| ∞ | Infinity | U+221E | ∞ | ∞ | ∞ | \221E | \u221E |
| ♠ | Black Spade Suit | U+2660 | ♠ | ♠ | ♠ | \2660 | \u2660 |
| 😀 | Grinning Face | U+1F600 | 😀 | 😀 | - | \1F600 | \u{1F600} |
| ™ | Trade Mark Sign | U+2122 | ™ | ™ | ™ | \2122 | \u2122 |
| ® | Registered Sign | U+00AE | ® | ® | ® | \00AE | \u00AE |
| ° | Degree Sign | U+00B0 | ° | ° | ° | \00B0 | \u00B0 |
| µ | Micro Sign | U+00B5 | µ | µ | µ | \00B5 | \u00B5 |
| ½ | Vulgar Fraction One Half | U+00BD | ½ | ½ | ½ | \00BD | \u00BD |