User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
Input (Native Text)
Output (Unicode Escapes)
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

Internationalized source files often contain characters outside the ASCII range (0 - 127). Build systems, Java .properties files, and certain legacy toolchains require every non-ASCII code point expressed as a \uXXXX escape sequence. Getting this wrong means garbled strings at runtime, broken localization bundles, or silent data loss when a pipeline strips high bytes. This tool performs the same transformation as the classic native2ascii utility: each code point with a value > U+007F is replaced by its four-digit hexadecimal escape. Supplementary characters above U+FFFF are emitted as a surrogate pair of two escapes. The reverse mode parses every \uXXXX token and reconstructs the original native text, including recombining surrogate pairs into their proper code points.

The conversion is deterministic and lossless for all Unicode planes (0 - 16). Note: the tool assumes input is valid UTF-8/UTF-16 as delivered by the browser. If your source file uses a single-byte encoding like ISO-8859-1, open it in a text editor with the correct encoding first, then paste the text here. Pro tip: Java .properties files written before Java 9 mandate ASCII-only content, so run every localized bundle through native-to-ASCII before packaging.

native2ascii unicode escape character encoding ascii converter \u escape unicode converter text encoding

Formulas

The native-to-ASCII conversion operates on each code point c in the input string:

{
c unchanged, if c ≀ 0x7F\u + hex(c)4 , if 0x80 ≀ c ≀ 0xFFFF\u + hex(H)4 + \u + hex(L)4 , if c > 0xFFFF

For supplementary code points (c > 0xFFFF), the surrogate pair is computed as:

H = c βˆ’ 0x100000x400 + 0xD800
L = (c βˆ’ 0x10000) mod 0x400 + 0xDC00

where H = high surrogate (0xD800 - 0xDBFF), L = low surrogate (0xDC00 - 0xDFFF), and hex(n)4 denotes the zero-padded four-digit hexadecimal representation.

The reverse operation uses the regex pattern /\\u([0-9A-Fa-f]{4})/g to locate each escape token and replaces it with String.fromCharCode(parseInt(match, 16)). Consecutive surrogate pairs are then recombined by the JavaScript engine into their original supplementary code point.

Reference Data

CharacterCode PointUnicode EscapeCategoryScript
Γ©U+00E9\u00E9Lowercase LetterLatin
Γ±U+00F1\u00F1Lowercase LetterLatin
ΓΌU+00FC\u00FCLowercase LetterLatin
δΈ­U+4E2D\u4E2DCJK IdeographHan
ζ—₯U+65E5\u65E5CJK IdeographHan
本U+672C\u672CCJK IdeographHan
ν•œU+D55C\uD55CSyllableHangul
κΈ€U+AE00\uAE00SyllableHangul
Ξ©U+03A9\u03A9Uppercase LetterGreek
Ο€U+03C0\u03C0Lowercase LetterGreek
Π”U+0414\u0414Uppercase LetterCyrillic
яU+044F\u044FLowercase LetterCyrillic
אU+05D0\u05D0LetterHebrew
ΨΉU+0639\u0639LetterArabic
β‚ΉU+20B9\u20B9Currency SymbolCommon
€U+20AC\u20ACCurrency SymbolCommon
Β£U+00A3\u00A3Currency SymbolCommon
Β©U+00A9\u00A9SymbolCommon
β„’U+2122\u2122SymbolCommon
∞U+221E\u221EMath SymbolCommon
β†’U+2192\u2192ArrowCommon
πŸ˜€U+1F600\uD83D\uDE00EmojiCommon (Surrogate Pair)
🎡U+1F3B5\uD83C\uDFB5EmojiCommon (Surrogate Pair)
π„žU+1D11E\uD834\uDD1EMusical SymbolCommon (Surrogate Pair)
𐍈U+10348\uD800\uDF48LetterGothic (Surrogate Pair)

Frequently Asked Questions

Characters with code points above U+FFFF cannot be represented by a single \uXXXX escape. The converter splits them into a UTF-16 surrogate pair: a high surrogate in the range 0xD800-0xDBFF followed by a low surrogate in 0xDC00-0xDFFF. For example, the emoji πŸ˜€ (U+1F600) becomes \uD83D\uDE00. The reverse operation detects adjacent surrogates and recombines them into the original code point.
No. All code points at or below U+007F (decimal 127) pass through unchanged, including control characters such as tab (\t), line feed (\n), and carriage return (\r). Only characters above this threshold are escaped. If you need to escape control characters as well, pre-process them separately with standard backslash notation.
The regex pattern strictly matches four hexadecimal digits [0-9A-Fa-f]. A sequence like \u00GZ does not match and is left as literal text in the output. The converter does not throw an error; it simply skips non-conforming patterns. Check the output for any remaining \u literals to identify malformed sequences.
Yes. Java .properties files prior to Java 9 require all non-ASCII characters to be expressed as \uXXXX escapes in ISO-8859-1 encoding. This converter produces output identical to the JDK native2ascii utility. For Java 9+ properties files that support UTF-8, conversion is optional but still useful for backward compatibility with older toolchains.
In native-to-ASCII mode, a literal backslash followed by "u" and four hex digits in the source text will be escaped character-by-character: the backslash becomes \u005C, and the rest remain ASCII. In reverse mode, the pattern \uXXXX is consumed and replaced. If your input intentionally contains literal \u sequences that should not be decoded, you need to escape the backslash first (\\u) before running the reverse conversion.
The converter runs entirely in the browser. Practical limits depend on available memory. Text inputs up to approximately 5 MB process in under one second on modern hardware. For files exceeding 10 MB, consider splitting them. The file upload feature reads the entire file into memory before conversion, so ensure your device has sufficient RAM for very large files.