User Rating 0.0 ★★★★★

Total Usage 0 times

Category HTML/XML Utilities

Prefer named entities Use decimal (not hex) Convert ALL characters

Input (Unicode Text)

Output (HTML Entities)

Character Breakdown 0 converted

Char	Unicode	Code Point	Entity	Type

Type or paste text above to see character breakdown

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Systems that cannot handle UTF-8 will silently corrupt your content. An unescaped © in an email template or a → in legacy CMS output can render as garbage bytes, breaking layouts and losing meaning. This tool converts every non-ASCII Unicode code point to its correct HTML named entity (e.g., ©), numeric HTML entity (e.g., ©), or CSS escape sequence (e.g., \00A9). ASCII characters in the printable range 0x20 - 0x7E pass through untouched. The converter references a dictionary of over 250 named HTML entities defined in the HTML5 specification. Characters without a named entity receive a hexadecimal numeric entity. This tool approximates a build-step encoder; it does not validate whether your target system supports HTML5 named entities versus the older HTML4 subset.

Formulas

The converter iterates each character in the input string. For each character, it extracts the Unicode code point using codePointAt. The decision logic follows:

{

char unchanged, if 0x20 ≤ cp ≤ 0x7E and char ∉ {& < > " "}&name; if named entity exists in map&#xHEX; otherwise (HTML mode)\HEX otherwise (CSS mode)

Where cp is the Unicode code point of the character, HEX is the uppercase hexadecimal representation of cp zero-padded to at least 4 digits, and name is the HTML5 named entity key (without the ampersand and semicolon). The five HTML-significant ASCII characters (&, <, >, ", ") are always converted even though they fall in the ASCII printable range, because they have syntactic meaning in HTML/CSS contexts.

Reference Data

Character	Unicode	HTML Named	HTML Numeric	CSS Entity	Description
©	U+00A9	©	©	\00A9	Copyright Sign
®	U+00AE	®	®	\00AE	Registered Sign
™	U+2122	™	™	\2122	Trade Mark Sign
&	U+0026	&	&	\0026	Ampersand
<	U+003C	<	<	\003C	Less-Than Sign
>	U+003E	>	>	\003E	Greater-Than Sign
"	U+0022	"	"	\0022	Quotation Mark
'	U+0027	'	'	\0027	Apostrophe
	U+00A0			\00A0	Non-Breaking Space
-	U+2014	—	—	\2014	Em Dash
-	U+2013	–	–	\2013	En Dash
«	U+00AB	«	«	\00AB	Left Guillemet
»	U+00BB	»	»	\00BB	Right Guillemet
€	U+20AC	€	€	\20AC	Euro Sign
£	U+00A3	£	£	\00A3	Pound Sign
¥	U+00A5	¥	¥	\00A5	Yen Sign
¢	U+00A2	¢	¢	\00A2	Cent Sign
•	U+2022	•	•	\2022	Bullet
…	U+2026	…	…	\2026	Horizontal Ellipsis
←	U+2190	←	←	\2190	Left Arrow
→	U+2192	→	→	\2192	Right Arrow
↑	U+2191	↑	↑	\2191	Up Arrow
↓	U+2193	↓	↓	\2193	Down Arrow
°	U+00B0	°	°	\00B0	Degree Sign
µ	U+00B5	µ	µ	\00B5	Micro Sign
π	U+03C0	π	π	\03C0	Greek Small Pi
α	U+03B1	α	α	\03B1	Greek Small Alpha
β	U+03B2	β	β	\03B2	Greek Small Beta
∞	U+221E	∞	∞	\221E	Infinity
∑	U+2211	∑	∑	\2211	N-Ary Summation

Frequently Asked Questions

Named entities like © use a human-readable alias defined in the HTML specification. Numeric entities like © reference the Unicode code point directly in hexadecimal. Both render identically in browsers. Named entities are more readable in source code but are limited to the ~250 characters the spec defines names for. Numeric entities work for any of the 149,000+ Unicode characters.

CSS entities (e.g., \00A9) are used inside CSS content properties, such as .icon::before { content: "\00A9"; }. HTML entities are invalid inside CSS. Conversely, CSS escape sequences are invalid inside HTML body text. Choose the output type based on where the encoded string will be consumed.

Yes. The converter uses codePointAt() which correctly handles surrogate pairs for characters above U+FFFF. An emoji like 😀 (U+1F600) converts to 😀 in HTML mode or \1F600 in CSS mode. There is no named HTML entity for most emoji, so numeric encoding is used.

These five characters (& < > " ') have syntactic meaning in HTML and can break document parsing if left unescaped. For example, a literal & in HTML content can be misinterpreted as the start of an entity reference. The converter always escapes them regardless of mode to produce safe output.

Code points are padded to a minimum of 4 hex digits. U+00A9 outputs as 00A9, not A9. Code points above U+FFFF use 5 or 6 digits as needed (e.g., U+1F600 outputs as 1F600). This follows the convention used in Unicode charts and ensures consistency in CSS where \A9 could be ambiguous if followed by a hex-valid character.

This tool converts ALL non-ASCII characters and the five HTML-significant ASCII characters in the input. If you paste raw HTML like

Héllo

, the angle brackets and quotes will also be escaped, which will break your markup. To preserve HTML structure, convert only the text content portions separately, not entire HTML documents.