User Rating 0.0 ★★★★★

Total Usage 0 times

Category Text Analysis

Enter text to analyze

0 / 5,000 characters

Enter text and click Check Unicode to analyze

#	Char	Codepoint	Name	Block	Category	Version

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Every character in digital text belongs to a specific Unicode version. A document mixing characters from Unicode 1.0 (1991) and Unicode 16.0 (2024) may render correctly on modern systems but fail on older terminals, embedded devices, or legacy databases. This tool extracts each codepoint U+XXXX from your input and maps it to the exact Unicode standard version that introduced it. It identifies the assigned block (e.g., "Latin Extended-B", "CJK Unified Ideographs") and general category (L for Letter, N for Number, S for Symbol). Use it to audit compatibility before deploying multilingual content or embedding special symbols in systems with constrained font support.

Limitation: this tool covers assigned codepoint ranges per version. Private Use Area characters (U+E000 - U+F8FF) are reported as version 1.1 per original allocation but carry no standard glyph. Unassigned codepoints return "Unassigned". Surrogate pair codepoints (U+D800 - U+DFFF) are encoding artifacts and not valid characters.

Formulas

Each character in the input string is decomposed into its Unicode codepoint using JavaScript's codePointAt method, which correctly handles surrogate pairs for astral plane characters (codepoints above U+FFFF).

cp = codePointAt(i)

The codepoint cp is then looked up against a sorted array of Unicode version assignment ranges. Each range is a tuple:

[start, end, version]

A character belongs to Unicode version v if:

start ≤ cp ≤ end ∧ version = v

Where cp is the decimal codepoint value, start and end define the inclusive range boundary, and version is a string like "6.0". The hex representation displayed is computed via:

hex = cp.toString(16).toUpperCase().padStart(4, "0")

The general category is determined by testing the codepoint against category ranges (e.g., L for letters matching alphabetic ranges, N for numbers matching digit/numeric ranges). Block membership is resolved similarly against the ~330 named Unicode blocks.

Reference Data

Unicode Version	Release Year	Total Characters Added	Notable Additions
1.0	1991	~7,129	Basic Latin, Greek, Cyrillic, CJK core
1.1	1993	~28,327	CJK Unified Ideographs (20,902), Tibetan
2.0	1996	~6,516	Surrogate mechanism, Hangul Syllables rewrite
3.0	1999	~10,307	Cherokee, Ethiopic, Khmer, Mongolian, Myanmar, Sinhala
3.1	2001	~44,946	CJK Extension B (42,711), Deseret, Gothic, Old Italic, Musical Symbols
3.2	2002	~1,016	Philippine scripts (Buhid, Hanunoo, Tagalog, Tagbanwa)
4.0	2003	~1,226	Cypriot, Limbu, Linear B, Osmanya, Shavian, Tai Le, Ugaritic
4.1	2005	~1,273	Buginese, Coptic, Glagolitic, New Tai Lue, Old Persian
5.0	2006	~1,369	N'Ko, Phags-pa, Phoenician, currency symbols
5.1	2008	~1,624	Carian, Lycian, Lydian, Vai, Sundanese
5.2	2009	~6,648	CJK Extension C, Egyptian Hieroglyphs, Tai Tham
6.0	2010	~2,088	Emoji first batch, Indian Rupee sign ₹, Mandaic, Batak
6.1	2012	~732	Chakma, Miao, Sharada, Sora Sompeng, Takri
6.2	2012	1	Turkish Lira sign ₺
6.3	2013	5	Bidirectional formatting characters
7.0	2014	~2,834	Ruble sign ₽, Bassa Vah, Duployan, Grantha, Khojki, Pau Cin Hau
8.0	2015	~7,716	CJK Extension E, Cherokee lowercase, Emoji skin tones (Fitzpatrick)
9.0	2016	~7,500	Adlam, Bhaiksuki, Tangut (6,881 chars), 72 new emoji
10.0	2017	~8,518	CJK Extension F, Bitcoin sign ₿, Zanabazar Square, Soyombo, 56 emoji
11.0	2018	~684	Dogra, Georgian Mtavruli, Hanifi Rohingya, 66 emoji
12.0	2019	~554	Elymaic, Nandinagari, Wancho, 61 emoji
12.1	2019	1	Reiwa era square character (Japanese)
13.0	2020	~5,930	CJK Extension G, Chorasmian, Dives Akuru, Yezidi, 55 emoji
14.0	2021	~838	Toto, Cypro-Minoan, Vithkuqi, Old Uyghur, Tangsa, 37 emoji
15.0	2022	~4,489	CJK Extension H, Kawi, Nag Mundari, 31 emoji
15.1	2023	627	CJK Extension I (622 ideographs), 5 emoji
16.0	2024	~5,185	CJK Extension J, Egyptian Hieroglyphs Extended-A, Garay, Gurung Khema, 7 emoji

Frequently Asked Questions

Many modern emoji are composed of multiple codepoints joined by Zero Width Joiner (U+200D). For example, the family emoji is four person codepoints joined by ZWJ. Each constituent codepoint may originate from a different Unicode version. This tool decomposes the full sequence and reports each codepoint individually. Variation selectors (U+FE0E and U+FE0F) also appear as separate entries.

CJK Unified Ideographs span Extensions A through J, each assigned to a specific Unicode version. The base CJK block (U+4E00 - U+9FFF) was partially allocated in version 1.1 and extended in later versions. This tool maps the major extension blocks accurately: Extension A to 3.0, Extension B to 3.1, Extension C to 5.2, and so on through Extension J in 16.0. Individual codepoint additions within existing blocks across minor versions are approximated to the block's primary version.

The Basic Multilingual Plane Private Use Area (U+E000 - U+F8FF) was allocated in Unicode 1.1. The Supplementary Private Use Areas (U+F0000 - U+10FFFF) were allocated in 2.0. These codepoints carry no standard glyph or name. The tool reports their version as the version that allocated the range. The actual glyphs depend entirely on installed fonts.

This tool reports which Unicode version introduced each character. To determine system compatibility, compare the reported version against your target platform's Unicode support level. For example, Android 10 supports Unicode 12.0, so any character from version 13.0 or later may not render. Windows 10 (May 2020) supports up to 13.0. Font coverage is a separate concern: a system may support a Unicode version but lack glyphs in installed fonts for specific blocks.

Control characters (U+0000 - U+001F and U+007F - U+009F) were inherited from ASCII and ISO 8859-1, included in Unicode 1.0/1.1. They have no visual representation. The tool displays them with their Unicode name (e.g., NULL, LINE FEED) and marks them as category Cc (Control). They are legitimate codepoints and will appear in the analysis.

Yes. Results are listed in strict input order, character by character. For astral plane characters encoded as surrogate pairs in JavaScript's internal UTF-16, the tool correctly yields a single codepoint per grapheme cluster base. However, combining character sequences (e.g., base letter followed by combining diacritical marks) appear as separate rows. The position column indicates the zero-based character index in the original string.