About

Incorrect character encoding breaks web pages. An unescaped & in HTML source triggers parse errors. A raw < opens an unintended tag. These are not cosmetic issues - they cause content loss, XSS vulnerabilities, and validation failures against the W3C HTML specification. This converter processes each character at code point c in the range 0 - 255 and outputs its correct HTML entity reference: named (e.g., &) when one exists in the HTML5 named character reference table, or numeric (decimal &#c; or hexadecimal ) otherwise.

Two modes are provided. Selective mode encodes only the 5 mandatory characters (&, <, >, ", ') plus non-ASCII bytes - sufficient for valid HTML output. Full mode encodes every character, producing output safe for transit through systems that mangle encoding (legacy email gateways, certain CMS platforms, database fields with charset mismatches). Note: this tool operates on code points 0 - 255 (ISO 8859-1 / Latin-1 superset). Characters beyond U+00FF are encoded as decimal or hex numeric references but fall outside the classical ASCII/Extended ASCII range.

Formulas

The conversion algorithm iterates over each character in the input string. For each character at index i, the code point c is extracted.

c = charCodeAt(i)

The encoder then applies one of three output formats based on user selection:

{

&name; if named entity exists for c&#c; if decimal mode selected&#xtoHex(c); if hexadecimal mode selected

In selective mode, only characters satisfying the predicate are encoded:

encode(c) if c ∈ {34, 38, 39, 60, 62} ∨ c > 127

In full mode, every character (except whitespace control characters like newline and tab, which are preserved for readability) is encoded:

encode(c) if c ∉ {9, 10, 13}

Where c = Unicode code point of the character. The hexadecimal conversion uses uppercase letters and the toString(16) base conversion. Named entity lookup is performed via a hash map with O(1) average access time, covering 97 standard HTML5 named references.

Reference Data

Character	Description	ASCII Code	Named Entity	Decimal	Hex
&	Ampersand	38	&	&	&
<	Less-than sign	60	<	<	<
>	Greater-than sign	62	>	>	>
"	Double quote	34	"	"	"
'	Apostrophe	39	'	'	'
	Non-breaking space	160
©	Copyright	169	©	©	©
®	Registered	174	®	®	®
™	Trademark	8482	™	™	™
€	Euro sign	8364	€	€	€
£	Pound sign	163	£	£	£
¥	Yen sign	165	¥	¥	¥
¢	Cent sign	162	¢	¢	¢
§	Section sign	167	§	§	§
¶	Pilcrow (paragraph)	182	¶	¶	¶
°	Degree sign	176	°	°	°
±	Plus-minus	177	±	±	±
µ	Micro sign	181	µ	µ	µ
·	Middle dot	183	·	·	·
×	Multiplication	215	×	×	×
÷	Division	247	÷	÷	÷
½	One half	189	½	½	½
¼	One quarter	188	¼	¼	¼
¾	Three quarters	190	¾	¾	¾
«	Left guillemet	171	«	«	«
»	Right guillemet	187	»	»	»
¿	Inverted question mark	191	¿	¿	¿
¡	Inverted exclamation	161	¡	¡	¡
-	Em dash	8212	—	—	—
-	En dash	8211	–	–	–

Frequently Asked Questions

Named entities like & are human-readable and preferred in hand-maintained HTML. Numeric decimal (&) or hexadecimal (&) references are universally supported across all XML parsers and are safer in XHTML/SVG contexts where named entity support may be limited. Use numeric references when generating markup programmatically or when targeting XML-based formats.

Selective mode encodes only the 5 characters that are syntactically meaningful in HTML (&, <, >, ", ') plus any character with a code point above 127. This produces minimal, readable output. Full mode encodes every printable character, which is necessary when your content passes through systems with unreliable charset handling - legacy email relays, certain database columns set to ASCII-only, or third-party CMS platforms that strip non-ASCII bytes.

Yes. While the tool's reference table focuses on the Latin-1 range (code points 0 - 255), the JavaScript charCodeAt method returns code points up to 65535 (the Basic Multilingual Plane). Characters like emoji that use surrogate pairs (above U+FFFF) are encoded as their surrogate pair numeric references. For full Unicode fidelity in such edge cases, consider using codePointAt - this tool handles standard BMP characters correctly.

Control characters like LF (code point 10), CR (13), and TAB (9) serve as whitespace formatting in the source. Encoding them as would produce valid HTML but destroy the readability of your source code. These characters have no rendering impact in HTML (browsers collapse them to spaces), so encoding them adds no safety benefit.

Yes, with caveats. Selective and full modes both encode the double quote (") and ampersand (&), which are the two critical characters inside attribute values. However, if your attribute uses single-quote delimiters, ensure the apostrophe (') is also encoded - selective mode handles this. Always validate your output against the W3C Markup Validation Service for production use.

PHP's htmlspecialchars() with the ENT_QUOTES flag is equivalent to this tool's selective mode. The key difference: this converter runs entirely in your browser - no data is transmitted to a server, which matters when encoding sensitive content like API keys or user credentials destined for HTML config files. This tool also offers full-encode mode and hex output, which htmlspecialchars() does not provide natively.