Character Encoding to ASCII/Unicode Escape Converter
Convert native characters to \uXXXX Unicode escapes or reverse ASCII escapes back to native text. Supports full Unicode, file upload, and download.
About
Internationalized source files often contain characters outside the ASCII range (0 - 127). Build systems, Java .properties files, and certain legacy toolchains require every non-ASCII code point expressed as a \uXXXX escape sequence. Getting this wrong means garbled strings at runtime, broken localization bundles, or silent data loss when a pipeline strips high bytes. This tool performs the same transformation as the classic native2ascii utility: each code point with a value > U+007F is replaced by its four-digit hexadecimal escape. Supplementary characters above U+FFFF are emitted as a surrogate pair of two escapes. The reverse mode parses every \uXXXX token and reconstructs the original native text, including recombining surrogate pairs into their proper code points.
The conversion is deterministic and lossless for all Unicode planes (0 - 16). Note: the tool assumes input is valid UTF-8/UTF-16 as delivered by the browser. If your source file uses a single-byte encoding like ISO-8859-1, open it in a text editor with the correct encoding first, then paste the text here. Pro tip: Java .properties files written before Java 9 mandate ASCII-only content, so run every localized bundle through native-to-ASCII before packaging.
Formulas
The native-to-ASCII conversion operates on each code point c in the input string:
\u + hex(c)4 , if 0x80 β€ c β€ 0xFFFF\u + hex(H)4 + \u + hex(L)4 , if c > 0xFFFFFor supplementary code points (c > 0xFFFF), the surrogate pair is computed as:
where H = high surrogate (0xD800 - 0xDBFF), L = low surrogate (0xDC00 - 0xDFFF), and hex(n)4 denotes the zero-padded four-digit hexadecimal representation.
The reverse operation uses the regex pattern /\\u([0-9A-Fa-f]{4})/g to locate each escape token and replaces it with String.fromCharCode(parseInt(match, 16)). Consecutive surrogate pairs are then recombined by the JavaScript engine into their original supplementary code point.
Reference Data
| Character | Code Point | Unicode Escape | Category | Script |
|---|---|---|---|---|
| Γ© | U+00E9 | \u00E9 | Lowercase Letter | Latin |
| Γ± | U+00F1 | \u00F1 | Lowercase Letter | Latin |
| ΓΌ | U+00FC | \u00FC | Lowercase Letter | Latin |
| δΈ | U+4E2D | \u4E2D | CJK Ideograph | Han |
| ζ₯ | U+65E5 | \u65E5 | CJK Ideograph | Han |
| ζ¬ | U+672C | \u672C | CJK Ideograph | Han |
| ν | U+D55C | \uD55C | Syllable | Hangul |
| κΈ | U+AE00 | \uAE00 | Syllable | Hangul |
| Ξ© | U+03A9 | \u03A9 | Uppercase Letter | Greek |
| Ο | U+03C0 | \u03C0 | Lowercase Letter | Greek |
| Π | U+0414 | \u0414 | Uppercase Letter | Cyrillic |
| Ρ | U+044F | \u044F | Lowercase Letter | Cyrillic |
| Χ | U+05D0 | \u05D0 | Letter | Hebrew |
| ΨΉ | U+0639 | \u0639 | Letter | Arabic |
| βΉ | U+20B9 | \u20B9 | Currency Symbol | Common |
| β¬ | U+20AC | \u20AC | Currency Symbol | Common |
| Β£ | U+00A3 | \u00A3 | Currency Symbol | Common |
| Β© | U+00A9 | \u00A9 | Symbol | Common |
| β’ | U+2122 | \u2122 | Symbol | Common |
| β | U+221E | \u221E | Math Symbol | Common |
| β | U+2192 | \u2192 | Arrow | Common |
| π | U+1F600 | \uD83D\uDE00 | Emoji | Common (Surrogate Pair) |
| π΅ | U+1F3B5 | \uD83C\uDFB5 | Emoji | Common (Surrogate Pair) |
| π | U+1D11E | \uD834\uDD1E | Musical Symbol | Common (Surrogate Pair) |
| π | U+10348 | \uD800\uDF48 | Letter | Gothic (Surrogate Pair) |