About

Incorrect character encoding in HTML documents causes rendering failures across browsers and locales. A single unescaped character outside the ASCII range can break XML parsers, corrupt RSS feeds, or display as � (the replacement character) on systems with mismatched encodings. This tool converts every character in your input to its hexadecimal HTML entity form HHHH;, using the Unicode code point value. It handles the full Unicode range from U+0000 to U+10FFFF, including astral plane characters (emoji, CJK extensions, mathematical symbols) that require surrogate pairs in UTF-16 but map to single code points.

The reverse decoder parses &#xHHHH; patterns using strict regex validation and reconstructs characters via String.fromCodePoint. Note: this tool does not handle named entities like &. It operates exclusively on hexadecimal numeric references. For documents served as UTF-8, hex entities are technically redundant for most characters. They remain essential when embedding content in attribute values, working within ASCII-only transport layers (email headers, legacy databases), or preventing XSS by encoding user-generated content before insertion into HTML.

Formulas

Each character in the input string is converted to its Unicode code point, then expressed as a hexadecimal HTML numeric character reference.

For each character c in input string S:
codePoint = codePointAt(c)
hexValue = toString(codePoint, 16)
entity = &#x + toUpperCase(hexValue) + ;

Where codePoint is the Unicode scalar value ranging from 0 to 10FFFF₁₆ (0 to 1,114,111₁₀). Characters in the Basic Multilingual Plane (BMP) have code points ≤ FFFF₁₆ and produce 1-4 hex digit entities. Supplementary plane characters (emoji, rare CJK) have code points > FFFF₁₆ and produce 5-digit entities.

Reverse decoding regex pattern:
match(/&#x([0-9a-fA-F]+);/g)
character = String.fromCodePoint(parseInt(hexDigits, 16))

Where hexDigits is the captured group from the regex. The parseInt function converts the hex string to a decimal integer, and String.fromCodePoint reconstructs the original character. This correctly handles surrogate pairs that String.fromCharCode cannot.

Reference Data

Character	Name	Code Point	Hex Entity	Category
&	Ampersand	U+0026	&	Must-Escape in HTML
<	Less-Than Sign	U+003C	<	Must-Escape in HTML
>	Greater-Than Sign	U+003E	>	Must-Escape in HTML
"	Quotation Mark	U+0022	"	Must-Escape in Attributes
'	Apostrophe	U+0027	'	Must-Escape in Attributes
	Non-Breaking Space	U+00A0		Whitespace
©	Copyright Sign	U+00A9	©	Special Symbol
®	Registered Sign	U+00AE	®	Special Symbol
™	Trade Mark Sign	U+2122	™	Special Symbol
€	Euro Sign	U+20AC	€	Currency
£	Pound Sign	U+00A3	£	Currency
¥	Yen Sign	U+00A5	¥	Currency
¢	Cent Sign	U+00A2	¢	Currency
-	Em Dash	U+2014	—	Punctuation
-	En Dash	U+2013	–	Punctuation
…	Horizontal Ellipsis	U+2026	…	Punctuation
•	Bullet	U+2022	•	Punctuation
°	Degree Sign	U+00B0	°	Math/Science
±	Plus-Minus Sign	U+00B1	±	Math/Science
×	Multiplication Sign	U+00D7	×	Math/Science
÷	Division Sign	U+00F7	÷	Math/Science
∞	Infinity	U+221E	∞	Math/Science
π	Greek Small Pi	U+03C0	π	Greek Letter
Ω	Greek Capital Omega	U+03A9	Ω	Greek Letter
α	Greek Small Alpha	U+03B1	α	Greek Letter
←	Leftwards Arrow	U+2190	←	Arrow
→	Rightwards Arrow	U+2192	→	Arrow
↑	Upwards Arrow	U+2191	↑	Arrow
↓	Downwards Arrow	U+2193	↓	Arrow
♠	Black Spade Suit	U+2660	♠	Miscellaneous

Frequently Asked Questions

Both are numeric character references that produce the same rendered character. The hexadecimal form A and the decimal form A both render as the letter "A". Hexadecimal is preferred when working with Unicode charts and specifications because Unicode code points are conventionally written in hex (U+0041). Browsers parse both forms identically per the HTML5 specification.

Yes. A single UTF-8 character like "A" occupies 1 byte, but its hex entity A occupies 6 bytes. For characters outside ASCII, the ratio is less dramatic: "€" is 3 bytes in UTF-8, while € is 9 bytes. For a typical page with 5,000 characters of body text, full hex encoding would increase payload by roughly 5× - 8×. Use hex entities selectively for characters that must be escaped or that might cause encoding issues in your pipeline.

The tool uses JavaScript's codePointAt() method, which correctly handles characters above U+FFFF (the BMP boundary). For example, the emoji 😀 has code point U+1F600 and converts to 😀. This differs from charCodeAt(), which would return surrogate pair values (0xD83D and 0xDE00) that do not form valid standalone entities. The hex entity 😀 is rendered correctly by all modern browsers.

Yes. The Unicode standard designates U+D800 through U+DFFF as surrogate code points. These are not valid scalar values and must not appear as numeric character references in HTML. Additionally, the noncharacters U+FDD0 - U+FDEF and U+xFFFE - U+xFFFF (in each plane) are permanently reserved. The HTML5 parser treats references to these as parse errors. This tool processes them if present but they will likely render as replacement characters (U+FFFD) in browsers.

In most modern workflows, declaring <meta charset="UTF-8"> eliminates the need for hex entities. However, hex entities are necessary when: (1) your content passes through a system that strips non-ASCII bytes (some email relay servers, legacy databases with Latin-1 columns), (2) you embed special characters inside XML CDATA sections or attribute values where the encoding is uncertain, (3) you need to obfuscate email addresses from simple scrapers (e.g., encoding @ as @), or (4) you generate HTML fragments for insertion into third-party templates where you do not control the charset declaration.

No. The HTML specification treats hex digits as case-insensitive. A, A, and A all produce the same character "A". This tool outputs uppercase hex digits (e.g., €) for consistency with the Unicode standard notation (U+20AC), but lowercase forms are equally valid.