User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
Separate code points with spaces, commas, newlines, or tabs.
Presets:
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

Unicode assigns a unique numerical identifier - a code point - to every character across all writing systems, technical symbols, and emoji. The full range spans from U+0000 to U+10FFFF, covering 1,114,112 possible positions across 17 planes. Misinterpreting a code point format or ignoring supplementary plane characters (anything above U+FFFF) leads to corrupted output, replacement characters (U+FFFD), or silent data loss in databases and APIs. This tool parses six common code point notations - U+XXXX, 0xXXXX, decimal integers, &#xHHHH;, &#DDD;, and \uXXXX - validates each against the legal Unicode range, rejects lone surrogates (U+D800 - U+DFFF), and reconstructs the original text using String.fromCodePoint. The tool approximates correct rendering assuming your browser and OS have the required fonts installed; missing glyphs will display as placeholder boxes, not conversion errors.

unicode code points text converter utf-8 character encoding unicode decoder hex to text

Formulas

Each input token is matched against format-specific regular expressions. The extracted hexadecimal or decimal string is parsed to an integer code point value cp. The conversion rule is:

cp = parseInt(hexStr, 16) for hex formats
cp = parseInt(decStr, 10) for decimal formats

Validation requires:

0 ≀ cp ≀ 1,114,111 (0x10FFFF)
cp βˆ‰ [0xD800, 0xDFFF] (surrogate range is illegal)

Valid code points are converted to characters via String.fromCodePoint(cp). For supplementary plane characters (cp > 0xFFFF), this function internally creates a UTF-16 surrogate pair:

cpβ€² = cp βˆ’ 0x10000
hi = 0xD800 + (cpβ€² >> 10)
lo = 0xDC00 + (cpβ€² & 0x3FF)

Where hi is the high surrogate and lo is the low surrogate. UTF-8 byte count per code point follows the encoding scheme:

{
1 byte if cp ≀ 0x7F2 bytes if cp ≀ 0x7FF3 bytes if cp ≀ 0xFFFF4 bytes if cp ≀ 0x10FFFF

Reference Data

Unicode PlaneRangeNameCharactersCommon Content
0U+0000 - U+FFFFBasic Multilingual Plane (BMP)65,536Latin, Cyrillic, Greek, CJK, common symbols
1U+10000 - U+1FFFFSupplementary Multilingual Plane65,536Emoji, historic scripts, musical symbols
2U+20000 - U+2FFFFSupplementary Ideographic Plane65,536CJK Unified Ideographs Extension B
3U+30000 - U+3FFFFTertiary Ideographic Plane65,536CJK Extension G, H
4-13U+40000 - U+DFFFFUnassigned655,360Reserved for future use
14U+E0000 - U+EFFFFSupplementary Special-purpose Plane65,536Tag characters, variation selectors
15U+F0000 - U+FFFFFSupplementary Private Use Area-A65,536Private-use characters
16U+100000 - U+10FFFFSupplementary Private Use Area-B65,536Private-use characters
Input FormatExampleRegex PatternBaseNotes
U+XXXXU+0041U\+[0-9A-Fa-f]{1,6}HexadecimalMost common Unicode notation
0xXXXX0x00410x[0-9A-Fa-f]{1,6}HexadecimalProgramming hex literal
Decimal65[0-9]+DecimalRaw integer code point value
&#xHHHH;A&#x[0-9A-Fa-f]+;HexadecimalHTML hex character reference
&#DDD;A&#[0-9]+;DecimalHTML decimal character reference
\uXXXX\u0041\\u[0-9A-Fa-f]{4}HexadecimalJavaScript/Java escape (BMP only)
\u{XXXXX}\u{1F600}\\u\{[0-9A-Fa-f]{1,6}\}HexadecimalES6+ extended escape (all planes)
Code PointCharacterNameBlockUTF-8 Bytes
U+0041ALatin Capital Letter ABasic Latin1
U+00E9Γ©Latin Small Letter E with AcuteLatin-1 Supplement2
U+4E16δΈ–CJK Unified IdeographCJK Unified Ideographs3
U+0410АCyrillic Capital Letter ACyrillic2
U+2603β˜ƒSnowmanMiscellaneous Symbols3
U+1F600πŸ˜€Grinning FaceEmoticons (Plane 1)4
U+1F4A9πŸ’©Pile of PooMiscellaneous Symbols (Plane 1)4
U+0000NULNull CharacterBasic Latin (C0 Controls)1
U+FEFFBOMByte Order MarkArabic Presentation Forms-B3
U+FFFDοΏ½Replacement CharacterSpecials3
U+200B(invisible)Zero Width SpaceGeneral Punctuation3
U+20AC€Euro SignCurrency Symbols3

Frequently Asked Questions

Code points U+D800 - U+DFFF are reserved for UTF-16 surrogate pairs. They are not valid Unicode scalar values and cannot represent standalone characters. Attempting to use String.fromCodePoint with a surrogate value throws a RangeError. This tool detects and flags them before conversion to prevent runtime errors.
Characters above U+FFFF (such as πŸ˜€ at U+1F600) reside on supplementary planes 1-16. JavaScript internally represents them as two UTF-16 code units (a surrogate pair), but String.fromCodePoint handles this transparently. The tool correctly converts any code point up to U+10FFFF regardless of plane. Rendering depends on font support in your browser and operating system.
Yes, when the format selector is set to Auto-Detect, the parser independently identifies each token's format using regex matching. You can freely mix U+0041, 0x42, 67, D, E, and \u0046 in the same input. Each token is parsed according to its detected format. Ambiguous tokens (e.g., 65 could be decimal or a bare hex value) are resolved by the auto-detect priority: HTML entities first, then U+ prefix, then 0x prefix, then \u escapes, and finally bare numbers as decimal.
Control characters in the range U+0000 - U+001F and U+007F - U+009F are valid Unicode code points and will convert successfully. However, they are non-printable. The output will contain the character but it may appear invisible or cause formatting side-effects (e.g., line breaks for U+000A). The detail table marks these as control characters and displays their Unicode name instead of attempting to render a glyph.
Yes. Unicode combining sequences are order-dependent. An accented letter like Γ© can be represented as a single code point (U+00E9) or as two: U+0065 (e) followed by U+0301 (combining acute accent). Flag emoji require specific Regional Indicator pairs (e.g., U+1F1FA U+1F1F8 for πŸ‡ΊπŸ‡Έ). If the order is wrong or pairs are incomplete, the characters render individually rather than as a combined glyph.
The conversion itself succeeded - the code point is valid. The box (β–‘ or β–―) or question mark diamond (οΏ½) indicates your system lacks a font containing a glyph for that code point. This is common for rare CJK extensions (Plane 2/3), historic scripts, and newly added emoji. Installing a comprehensive Unicode font like Noto Sans or adjusting your OS fallback font chain resolves most cases. The detail table still shows the correct code point value and Unicode name.