User Rating 0.0 โ˜…โ˜…โ˜…โ˜…โ˜…
Total Usage 0 times
Supports all Unicode including emoji, CJK, and control characters.
Is this tool helpful?

Your feedback helps us improve.

โ˜… โ˜… โ˜… โ˜… โ˜…

About

Every character rendered on screen maps to a numeric code point in the Unicode standard. Misidentifying a code point leads to encoding corruption, broken internationalization, and silent data loss in databases that reject out-of-range values. This tool converts any input character - including supplementary plane symbols like emoji and CJK ideographs beyond U+FFFF - into its precise decimal, hexadecimal, octal, binary, or HTML entity representation using codePointAt rather than the legacy charCodeAt, which fails on surrogate pairs. It handles the full Unicode range from U+0000 to U+10FFFF.

Note: this tool operates on Unicode scalar values. It does not decompose grapheme clusters such as flag sequences or skin-tone modified emoji into their constituent code points - each code unit in the surrogate pair is resolved individually when using the code-unit mode. For canonical decomposition, a separate normalization step (NFC/NFD) is required before conversion. Pro tip: when debugging encoding issues, compare the hex output here against your database's column collation to confirm whether the mismatch is at the application layer or storage layer.

charcode unicode ascii character converter code point hex converter html entity

Formulas

The core conversion extracts the Unicode code point of each character and represents it in the target radix. For a character c at position i in the input string:

cp = codePointAt(s, i)

The code point cp is then converted to the desired output format:

Decimal: cp10 = cp.toString(10)
Hexadecimal: cp16 = 0x + cp.toString(16)
Octal: cp8 = 0o + cp.toString(8)
Binary: cp2 = cp.toString(2)
HTML Entity: &# + cp10 + ;

Where cp = the Unicode code point (a non-negative integer in the range 0 to 1,114,111 or 0x10FFFF), s = the input string, and i = the character index. The function codePointAt correctly handles supplementary plane characters (code points > 0xFFFF) that are stored as surrogate pairs in JavaScript's UTF-16 internal encoding, unlike the legacy charCodeAt which returns the individual surrogate values.

For reverse conversion (code point โ†’ character):

c = fromCodePoint(cp)

Reference Data

CharacterNameDecimalHexHTML EntityUTF-8 Bytes
NULNull00x0000
TABHorizontal Tab90x09 09
LFLine Feed100x0A 0A
CRCarriage Return130x0D 0D
Space320x20 20
!Exclamation Mark330x21!21
0Digit Zero480x30030
ALatin Capital A650x41A41
ZLatin Capital Z900x5AZ5A
aLatin Small A970x61a61
zLatin Small Z1220x7Az7A
~Tilde1260x7E~7E
DELDelete1270x7F7F
ยฉCopyright Sign1690xA9©C2 A9
ยฎRegistered Sign1740xAE®C2 AE
รฑLatin Small N with Tilde2410xF1ñC3 B1
ฮฑGreek Small Alpha9450x3B1αCE B1
ฯ€Greek Small Pi9600x3C0πCE 80
โ˜ƒSnowman97310x2603E2 98 83
โคHeavy Black Heart100840x2764E2 9D A4
ไธ–CJK Ideograph (World)199900x4E16E4 B8 96
๐Ÿ˜€Grinning Face Emoji1285120x1F600😀F0 9F 98 80
๐Ÿ’กElectric Light Bulb1281610x1F4A1💡F0 9F 92 A1
๐ŸŒEarth Globe Europe-Africa1277570x1F30D🌍F0 9F 8C 8D

Frequently Asked Questions

The method charCodeAt returns the UTF-16 code unit at a given index, which is a value between 0 and 65535 (0xFFFF). Characters outside the Basic Multilingual Plane (BMP), such as emoji, are encoded as two surrogate code units. charCodeAt returns only one surrogate at a time, giving an incorrect result. In contrast, codePointAt correctly reads both surrogates and returns the full code point up to 0x10FFFF. This tool uses codePointAt by default to ensure accuracy across the entire Unicode range.
Emoji and supplementary plane characters have code points above 0xFFFF. In JavaScript's internal UTF-16 encoding, they occupy two 16-bit code units (a surrogate pair). This tool iterates using the Symbol.iterator protocol (via Array.from), which correctly yields one string element per code point rather than per code unit. Each element is then passed to codePointAt(0) to extract the full scalar value. For example, the grinning face emoji yields code point 128512 (0x1F600), not two separate surrogate values.
The binary output shown is the direct base-2 representation of the Unicode code point integer. UTF-8 is a variable-length encoding that adds framing bits: a 2-byte UTF-8 sequence uses the pattern 110xxxxx 10xxxxxx, a 3-byte uses 1110xxxx 10xxxxxx 10xxxxxx, and a 4-byte uses 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx. The code point binary and the UTF-8 binary are related but not identical. This tool outputs the code point value in the selected radix, not the wire encoding.
Yes. The tool includes a reverse mode. Enter a numeric code point (decimal, or prefixed with 0x for hex, 0o for octal, or 0b for binary) in the reverse input field. The tool calls String.fromCodePoint to reconstruct the character. Multiple code points can be separated by spaces or commas. Invalid values outside the range 0 - 0x10FFFF or lone surrogates (0xD800 - 0xDFFF) will produce an error.
Control characters have no visible glyph. The tool displays a descriptive label such as NUL, SOH, STX, ETX, TAB, LF, CR, ESC, etc., in place of the raw character. The numeric code point values are still accurate. This prevents the output table from appearing broken or having invisible cells. The Unicode name is sourced from an internal lookup table covering all C0 and C1 control codes (0x00 - 0x1F and 0x7F - 0x9F).
The tool outputs numeric decimal entities (&#cp;) by default, as these work universally for any code point. For a curated set of commonly used named entities (such as &, <, ©, π, etc.), the tool also shows the named form alongside the numeric one. Named entities only exist for a subset of Unicode; numeric references cover the full range.