User Rating 0.0
Total Usage 0 times
Character Breakdown 0 converted
Char Unicode Code Point Entity Type
Type or paste text above to see character breakdown
Is this tool helpful?

Your feedback helps us improve.

About

Systems that cannot handle UTF-8 will silently corrupt your content. An unescaped © in an email template or a in legacy CMS output can render as garbage bytes, breaking layouts and losing meaning. This tool converts every non-ASCII Unicode code point to its correct HTML named entity (e.g., ©), numeric HTML entity (e.g., ©), or CSS escape sequence (e.g., \00A9). ASCII characters in the printable range 0x20 - 0x7E pass through untouched. The converter references a dictionary of over 250 named HTML entities defined in the HTML5 specification. Characters without a named entity receive a hexadecimal numeric entity. This tool approximates a build-step encoder; it does not validate whether your target system supports HTML5 named entities versus the older HTML4 subset.

unicode converter html entities css entities character encoding html escape unicode escape entity converter special characters

Formulas

The converter iterates each character in the input string. For each character, it extracts the Unicode code point using codePointAt. The decision logic follows:

{
char unchanged, if 0x20 cp 0x7E and char {& < > " "}&name; if named entity exists in map&#xHEX; otherwise (HTML mode)\HEX otherwise (CSS mode)

Where cp is the Unicode code point of the character, HEX is the uppercase hexadecimal representation of cp zero-padded to at least 4 digits, and name is the HTML5 named entity key (without the ampersand and semicolon). The five HTML-significant ASCII characters (&, <, >, ", ") are always converted even though they fall in the ASCII printable range, because they have syntactic meaning in HTML/CSS contexts.

Reference Data

CharacterUnicodeHTML NamedHTML NumericCSS EntityDescription
©U+00A9©©\00A9Copyright Sign
®U+00AE®®\00AERegistered Sign
U+2122\2122Trade Mark Sign
&U+0026&&\0026Ampersand
<U+003C<<\003CLess-Than Sign
>U+003E>>\003EGreater-Than Sign
"U+0022""\0022Quotation Mark
'U+0027''\0027Apostrophe
U+00A0  \00A0Non-Breaking Space
- U+2014\2014Em Dash
- U+2013\2013En Dash
«U+00AB««\00ABLeft Guillemet
»U+00BB»»\00BBRight Guillemet
U+20AC\20ACEuro Sign
£U+00A3££\00A3Pound Sign
¥U+00A5¥¥\00A5Yen Sign
¢U+00A2¢¢\00A2Cent Sign
U+2022\2022Bullet
U+2026\2026Horizontal Ellipsis
U+2190\2190Left Arrow
U+2192\2192Right Arrow
U+2191\2191Up Arrow
U+2193\2193Down Arrow
°U+00B0°°\00B0Degree Sign
µU+00B5µµ\00B5Micro Sign
πU+03C0ππ\03C0Greek Small Pi
αU+03B1αα\03B1Greek Small Alpha
βU+03B2ββ\03B2Greek Small Beta
U+221E\221EInfinity
U+2211\2211N-Ary Summation

Frequently Asked Questions

Named entities like © use a human-readable alias defined in the HTML specification. Numeric entities like © reference the Unicode code point directly in hexadecimal. Both render identically in browsers. Named entities are more readable in source code but are limited to the ~250 characters the spec defines names for. Numeric entities work for any of the 149,000+ Unicode characters.
CSS entities (e.g., \00A9) are used inside CSS content properties, such as .icon::before { content: "\00A9"; }. HTML entities are invalid inside CSS. Conversely, CSS escape sequences are invalid inside HTML body text. Choose the output type based on where the encoded string will be consumed.
Yes. The converter uses codePointAt() which correctly handles surrogate pairs for characters above U+FFFF. An emoji like 😀 (U+1F600) converts to 😀 in HTML mode or \1F600 in CSS mode. There is no named HTML entity for most emoji, so numeric encoding is used.
These five characters (& < > " ') have syntactic meaning in HTML and can break document parsing if left unescaped. For example, a literal & in HTML content can be misinterpreted as the start of an entity reference. The converter always escapes them regardless of mode to produce safe output.
Code points are padded to a minimum of 4 hex digits. U+00A9 outputs as 00A9, not A9. Code points above U+FFFF use 5 or 6 digits as needed (e.g., U+1F600 outputs as 1F600). This follows the convention used in Unicode charts and ensures consistency in CSS where \A9 could be ambiguous if followed by a hex-valid character.
This tool converts ALL non-ASCII characters and the five HTML-significant ASCII characters in the input. If you paste raw HTML like

Héllo

, the angle brackets and quotes will also be escaped, which will break your markup. To preserve HTML structure, convert only the text content portions separately, not entire HTML documents.