ASCII to Unicode Converter
Convert ASCII text to Unicode code points (hex, decimal, binary, octal) and decode Unicode back to readable characters instantly.
| # | Char | Hex | Decimal | Octal | Binary | HTML | Block |
|---|
About
Character encoding errors corrupt data silently. A single misidentified code point can break JSON parsing, corrupt database fields, or render text as garbled mojibake across systems that disagree on encoding. This tool converts between raw text and its Unicode code point representation using the full U+0000 notation defined by the Unicode Consortium (ISO/IEC 10646). It handles the complete Basic Multilingual Plane (BMP, U+0000 to U+FFFF) and Supplementary Planes up to U+10FFFF) via surrogate-aware parsing. Output formats include hexadecimal, decimal, octal, binary, and HTML entities. The reverse direction accepts mixed notation: U+XXXX, \uXXXX, HH;, and raw decimal DDD;. Limitation: Unicode character names are provided for common blocks only. Normalization (NFC/NFD) is not applied.
Formulas
Each character in a string maps to an integer code point. The conversion from character c to its hexadecimal Unicode representation follows:
Where codePoint(c) extracts the scalar value via JavaScript's codePointAt(0), which correctly handles astral plane characters (code points > 0xFFFF) encoded as surrogate pairs in UTF-16. The padding rule: BMP characters (β€ FFFF) use 4 hex digits; supplementary characters use 5 or 6.
For the reverse (Unicode β text), the parser matches multiple notations via a union pattern:
Each matched token is parsed to an integer n, validated against 0 β€ n β€ 0x10FFFF (excluding surrogates 0xD800 - 0xDFFF), then reconstituted via String.fromCodePoint(n).
Alternative output bases use standard positional notation:
Reference Data
| Block Name | Range | Characters | Common Usage |
|---|---|---|---|
| Basic Latin (ASCII) | U+0000 - 007F | 128 | English letters, digits, punctuation |
| Latin-1 Supplement | U+0080 - 00FF | 128 | Accented letters (Γ©, ΓΌ, Γ±) |
| Latin Extended-A | U+0100 - 017F | 128 | Central/Eastern European scripts |
| Greek and Coptic | U+0370 - 03FF | 135 | Ξ±, Ξ², Ξ³, Ξ, Ξ£ math symbols |
| Cyrillic | U+0400 - 04FF | 256 | Russian, Ukrainian, Bulgarian |
| Arabic | U+0600 - 06FF | 256 | Arabic script languages |
| Devanagari | U+0900 - 097F | 128 | Hindi, Sanskrit, Marathi |
| CJK Unified Ideographs | U+4E00 - 9FFF | 20,992 | Chinese, Japanese Kanji, Korean Hanja |
| Hiragana | U+3040 - 309F | 93 | Japanese phonetic script |
| Katakana | U+30A0 - 30FF | 96 | Japanese foreign loanwords |
| Hangul Syllables | U+AC00 - D7AF | 11,172 | Korean syllable blocks |
| General Punctuation | U+2000 - 206F | 111 | Em dash, ellipsis, non-breaking spaces |
| Currency Symbols | U+20A0 - 20CF | 33 | β¬, Β£, Β₯, βΉ, βΏ |
| Mathematical Operators | U+2200 - 22FF | 256 | β, β, β, β«, β, β |
| Arrows | U+2190 - 21FF | 112 | β, β, β, β, β |
| Box Drawing | U+2500 - 257F | 128 | Terminal/console UI borders |
| Emoticons | U+1F600 - 1F64F | 80 | πππ emoji faces |
| Misc Symbols & Pictographs | U+1F300 - 1F5FF | 768 | πππ₯ common emoji |
| Private Use Area | U+E000 - F8FF | 6,400 | Custom font glyphs (icon fonts) |
| Surrogates (reserved) | U+D800 - DFFF | 2,048 | UTF-16 encoding pairs (not characters) |