User Rating 0.0 ★★★★★

Total Usage 0 times

Category Code Utilities

Enter characters Supports all Unicode including emoji, CJK, and control characters.

Output format

Delimiter

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Every character rendered on screen maps to a numeric code point in the Unicode standard. Misidentifying a code point leads to encoding corruption, broken internationalization, and silent data loss in databases that reject out-of-range values. This tool converts any input character - including supplementary plane symbols like emoji and CJK ideographs beyond U+FFFF - into its precise decimal, hexadecimal, octal, binary, or HTML entity representation using codePointAt rather than the legacy charCodeAt, which fails on surrogate pairs. It handles the full Unicode range from U+0000 to U+10FFFF.

Note: this tool operates on Unicode scalar values. It does not decompose grapheme clusters such as flag sequences or skin-tone modified emoji into their constituent code points - each code unit in the surrogate pair is resolved individually when using the code-unit mode. For canonical decomposition, a separate normalization step (NFC/NFD) is required before conversion. Pro tip: when debugging encoding issues, compare the hex output here against your database's column collation to confirm whether the mismatch is at the application layer or storage layer.

Formulas

The core conversion extracts the Unicode code point of each character and represents it in the target radix. For a character c at position i in the input string:

cp = codePointAt(s, i)

The code point cp is then converted to the desired output format:

Decimal: cp₁₀ = cp.toString(10)

Hexadecimal: cp₁₆ = 0x + cp.toString(16)

Octal: cp₈ = 0o + cp.toString(8)

Binary: cp₂ = cp.toString(2)

HTML Entity: &# + cp₁₀ + ;

Where cp = the Unicode code point (a non-negative integer in the range 0 to 1,114,111 or 0x10FFFF), s = the input string, and i = the character index. The function codePointAt correctly handles supplementary plane characters (code points > 0xFFFF) that are stored as surrogate pairs in JavaScript's UTF-16 internal encoding, unlike the legacy charCodeAt which returns the individual surrogate values.

For reverse conversion (code point → character):

c = fromCodePoint(cp)

Reference Data

Character	Name	Decimal	Hex	HTML Entity	UTF-8 Bytes
NUL	Null	0	0x00	�	00
TAB	Horizontal Tab	9	0x09		09
LF	Line Feed	10	0x0A		0A
CR	Carriage Return	13	0x0D		0D
	Space	32	0x20		20
!	Exclamation Mark	33	0x21	!	21
0	Digit Zero	48	0x30	0	30
A	Latin Capital A	65	0x41	A	41
Z	Latin Capital Z	90	0x5A	Z	5A
a	Latin Small A	97	0x61	a	61
z	Latin Small Z	122	0x7A	z	7A
~	Tilde	126	0x7E	~	7E
DEL	Delete	127	0x7F		7F
©	Copyright Sign	169	0xA9	©	C2 A9
®	Registered Sign	174	0xAE	®	C2 AE
ñ	Latin Small N with Tilde	241	0xF1	ñ	C3 B1
α	Greek Small Alpha	945	0x3B1	α	CE B1
π	Greek Small Pi	960	0x3C0	π	CE 80
☃	Snowman	9731	0x2603	☃	E2 98 83
❤	Heavy Black Heart	10084	0x2764	❤	E2 9D A4
世	CJK Ideograph (World)	19990	0x4E16	世	E4 B8 96
😀	Grinning Face Emoji	128512	0x1F600	😀	F0 9F 98 80
💡	Electric Light Bulb	128161	0x1F4A1	💡	F0 9F 92 A1
🌍	Earth Globe Europe-Africa	127757	0x1F30D	🌍	F0 9F 8C 8D

Frequently Asked Questions

The method charCodeAt returns the UTF-16 code unit at a given index, which is a value between 0 and 65535 (0xFFFF). Characters outside the Basic Multilingual Plane (BMP), such as emoji, are encoded as two surrogate code units. charCodeAt returns only one surrogate at a time, giving an incorrect result. In contrast, codePointAt correctly reads both surrogates and returns the full code point up to 0x10FFFF. This tool uses codePointAt by default to ensure accuracy across the entire Unicode range.

Emoji and supplementary plane characters have code points above 0xFFFF. In JavaScript's internal UTF-16 encoding, they occupy two 16-bit code units (a surrogate pair). This tool iterates using the Symbol.iterator protocol (via Array.from), which correctly yields one string element per code point rather than per code unit. Each element is then passed to codePointAt(0) to extract the full scalar value. For example, the grinning face emoji yields code point 128512 (0x1F600), not two separate surrogate values.

The binary output shown is the direct base-2 representation of the Unicode code point integer. UTF-8 is a variable-length encoding that adds framing bits: a 2-byte UTF-8 sequence uses the pattern 110xxxxx 10xxxxxx, a 3-byte uses 1110xxxx 10xxxxxx 10xxxxxx, and a 4-byte uses 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx. The code point binary and the UTF-8 binary are related but not identical. This tool outputs the code point value in the selected radix, not the wire encoding.

Yes. The tool includes a reverse mode. Enter a numeric code point (decimal, or prefixed with 0x for hex, 0o for octal, or 0b for binary) in the reverse input field. The tool calls String.fromCodePoint to reconstruct the character. Multiple code points can be separated by spaces or commas. Invalid values outside the range 0 - 0x10FFFF or lone surrogates (0xD800 - 0xDFFF) will produce an error.

Control characters have no visible glyph. The tool displays a descriptive label such as NUL, SOH, STX, ETX, TAB, LF, CR, ESC, etc., in place of the raw character. The numeric code point values are still accurate. This prevents the output table from appearing broken or having invisible cells. The Unicode name is sourced from an internal lookup table covering all C0 and C1 control codes (0x00 - 0x1F and 0x7F - 0x9F).

The tool outputs numeric decimal entities (&#cp;) by default, as these work universally for any code point. For a curated set of commonly used named entities (such as &, <, ©, π, etc.), the tool also shows the named form alongside the numeric one. Named entities only exist for a subset of Unicode; numeric references cover the full range.