User Rating 0.0 โ˜…โ˜…โ˜…โ˜…โ˜…
Total Usage 0 times
Input Text 0 characters
Hex Entities Output 0 characters
Is this tool helpful?

Your feedback helps us improve.

โ˜… โ˜… โ˜… โ˜… โ˜…

About

Incorrect character encoding in HTML documents causes rendering failures across browsers and locales. A single unescaped character outside the ASCII range can break XML parsers, corrupt RSS feeds, or display as ๏ฟฝ (the replacement character) on systems with mismatched encodings. This tool converts every character in your input to its hexadecimal HTML entity form HHHH;, using the Unicode code point value. It handles the full Unicode range from U+0000 to U+10FFFF, including astral plane characters (emoji, CJK extensions, mathematical symbols) that require surrogate pairs in UTF-16 but map to single code points.

The reverse decoder parses &#xHHHH; patterns using strict regex validation and reconstructs characters via String.fromCodePoint. Note: this tool does not handle named entities like &. It operates exclusively on hexadecimal numeric references. For documents served as UTF-8, hex entities are technically redundant for most characters. They remain essential when embedding content in attribute values, working within ASCII-only transport layers (email headers, legacy databases), or preventing XSS by encoding user-generated content before insertion into HTML.

html entities hex converter unicode character encoding html escape hex entities text to hex

Formulas

Each character in the input string is converted to its Unicode code point, then expressed as a hexadecimal HTML numeric character reference.

For each character c in input string S:
codePoint = codePointAt(c)
hexValue = toString(codePoint, 16)
entity = &#x + toUpperCase(hexValue) + ;

Where codePoint is the Unicode scalar value ranging from 0 to 10FFFF16 (0 to 1,114,11110). Characters in the Basic Multilingual Plane (BMP) have code points โ‰ค FFFF16 and produce 1-4 hex digit entities. Supplementary plane characters (emoji, rare CJK) have code points > FFFF16 and produce 5-digit entities.

Reverse decoding regex pattern:
match(/&#x([0-9a-fA-F]+);/g)
character = String.fromCodePoint(parseInt(hexDigits, 16))

Where hexDigits is the captured group from the regex. The parseInt function converts the hex string to a decimal integer, and String.fromCodePoint reconstructs the original character. This correctly handles surrogate pairs that String.fromCharCode cannot.

Reference Data

CharacterNameCode PointHex EntityCategory
&AmpersandU+0026&Must-Escape in HTML
<Less-Than SignU+003C<Must-Escape in HTML
>Greater-Than SignU+003E>Must-Escape in HTML
"Quotation MarkU+0022"Must-Escape in Attributes
'ApostropheU+0027'Must-Escape in Attributes
Non-Breaking SpaceU+00A0 Whitespace
ยฉCopyright SignU+00A9©Special Symbol
ยฎRegistered SignU+00AE®Special Symbol
โ„ขTrade Mark SignU+2122Special Symbol
โ‚ฌEuro SignU+20ACCurrency
ยฃPound SignU+00A3£Currency
ยฅYen SignU+00A5¥Currency
ยขCent SignU+00A2¢Currency
- Em DashU+2014Punctuation
- En DashU+2013Punctuation
โ€ฆHorizontal EllipsisU+2026Punctuation
โ€ขBulletU+2022Punctuation
ยฐDegree SignU+00B0°Math/Science
ยฑPlus-Minus SignU+00B1±Math/Science
ร—Multiplication SignU+00D7×Math/Science
รทDivision SignU+00F7÷Math/Science
โˆžInfinityU+221EMath/Science
ฯ€Greek Small PiU+03C0πGreek Letter
ฮฉGreek Capital OmegaU+03A9ΩGreek Letter
ฮฑGreek Small AlphaU+03B1αGreek Letter
โ†Leftwards ArrowU+2190Arrow
โ†’Rightwards ArrowU+2192Arrow
โ†‘Upwards ArrowU+2191Arrow
โ†“Downwards ArrowU+2193Arrow
โ™ Black Spade SuitU+2660Miscellaneous

Frequently Asked Questions

Both are numeric character references that produce the same rendered character. The hexadecimal form A and the decimal form A both render as the letter "A". Hexadecimal is preferred when working with Unicode charts and specifications because Unicode code points are conventionally written in hex (U+0041). Browsers parse both forms identically per the HTML5 specification.
Yes. A single UTF-8 character like "A" occupies 1 byte, but its hex entity A occupies 6 bytes. For characters outside ASCII, the ratio is less dramatic: "โ‚ฌ" is 3 bytes in UTF-8, while โ‚ฌ is 9 bytes. For a typical page with 5,000 characters of body text, full hex encoding would increase payload by roughly 5ร— - 8ร—. Use hex entities selectively for characters that must be escaped or that might cause encoding issues in your pipeline.
The tool uses JavaScript's codePointAt() method, which correctly handles characters above U+FFFF (the BMP boundary). For example, the emoji ๐Ÿ˜€ has code point U+1F600 and converts to ๐Ÿ˜€. This differs from charCodeAt(), which would return surrogate pair values (0xD83D and 0xDE00) that do not form valid standalone entities. The hex entity ๐Ÿ˜€ is rendered correctly by all modern browsers.
Yes. The Unicode standard designates U+D800 through U+DFFF as surrogate code points. These are not valid scalar values and must not appear as numeric character references in HTML. Additionally, the noncharacters U+FDD0 - U+FDEF and U+xFFFE - U+xFFFF (in each plane) are permanently reserved. The HTML5 parser treats references to these as parse errors. This tool processes them if present but they will likely render as replacement characters (U+FFFD) in browsers.
In most modern workflows, declaring <meta charset="UTF-8"> eliminates the need for hex entities. However, hex entities are necessary when: (1) your content passes through a system that strips non-ASCII bytes (some email relay servers, legacy databases with Latin-1 columns), (2) you embed special characters inside XML CDATA sections or attribute values where the encoding is uncertain, (3) you need to obfuscate email addresses from simple scrapers (e.g., encoding @ as @), or (4) you generate HTML fragments for insertion into third-party templates where you do not control the charset declaration.
No. The HTML specification treats hex digits as case-insensitive. A, A, and A all produce the same character "A". This tool outputs uppercase hex digits (e.g., โ‚ฌ) for consistency with the Unicode standard notation (U+20AC), but lowercase forms are equally valid.