User Rating 0.0
Total Usage 0 times
Input (ASCII / Text)
0 chars
Output (HTML Entities)
Is this tool helpful?

Your feedback helps us improve.

About

Incorrect character encoding breaks web pages. An unescaped & in HTML source triggers parse errors. A raw < opens an unintended tag. These are not cosmetic issues - they cause content loss, XSS vulnerabilities, and validation failures against the W3C HTML specification. This converter processes each character at code point c in the range 0 - 255 and outputs its correct HTML entity reference: named (e.g., &) when one exists in the HTML5 named character reference table, or numeric (decimal &#c; or hexadecimal &#xc;) otherwise.

Two modes are provided. Selective mode encodes only the 5 mandatory characters (&, <, >, ", ') plus non-ASCII bytes - sufficient for valid HTML output. Full mode encodes every character, producing output safe for transit through systems that mangle encoding (legacy email gateways, certain CMS platforms, database fields with charset mismatches). Note: this tool operates on code points 0 - 255 (ISO 8859-1 / Latin-1 superset). Characters beyond U+00FF are encoded as decimal or hex numeric references but fall outside the classical ASCII/Extended ASCII range.

ascii to html html entities character encoding html converter ascii encoder special characters html

Formulas

The conversion algorithm iterates over each character in the input string. For each character at index i, the code point c is extracted.

c = charCodeAt(i)

The encoder then applies one of three output formats based on user selection:

{
&name; if named entity exists for c&#c; if decimal mode selected&#xtoHex(c); if hexadecimal mode selected

In selective mode, only characters satisfying the predicate are encoded:

encode(c) if c {34, 38, 39, 60, 62} c > 127

In full mode, every character (except whitespace control characters like newline and tab, which are preserved for readability) is encoded:

encode(c) if c {9, 10, 13}

Where c = Unicode code point of the character. The hexadecimal conversion uses uppercase letters and the toString(16) base conversion. Named entity lookup is performed via a hash map with O(1) average access time, covering 97 standard HTML5 named references.

Reference Data

CharacterDescriptionASCII CodeNamed EntityDecimalHex
&Ampersand38&&&
<Less-than sign60<<<
>Greater-than sign62>>>
"Double quote34"""
'Apostrophe39'''
Non-breaking space160   
©Copyright169©©©
®Registered174®®®
Trademark8482
Euro sign8364
£Pound sign163£££
¥Yen sign165¥¥¥
¢Cent sign162¢¢¢
§Section sign167§§§
Pilcrow (paragraph)182
°Degree sign176°°°
±Plus-minus177±±±
µMicro sign181µµµ
·Middle dot183···
×Multiplication215×××
÷Division247÷÷÷
½One half189½½½
¼One quarter188¼¼¼
¾Three quarters190¾¾¾
«Left guillemet171«««
»Right guillemet187»»»
¿Inverted question mark191¿¿¿
¡Inverted exclamation161¡¡¡
- Em dash8212
- En dash8211

Frequently Asked Questions

Named entities like & are human-readable and preferred in hand-maintained HTML. Numeric decimal (&) or hexadecimal (&) references are universally supported across all XML parsers and are safer in XHTML/SVG contexts where named entity support may be limited. Use numeric references when generating markup programmatically or when targeting XML-based formats.
Selective mode encodes only the 5 characters that are syntactically meaningful in HTML (&, <, >, ", ') plus any character with a code point above 127. This produces minimal, readable output. Full mode encodes every printable character, which is necessary when your content passes through systems with unreliable charset handling - legacy email relays, certain database columns set to ASCII-only, or third-party CMS platforms that strip non-ASCII bytes.
Yes. While the tool's reference table focuses on the Latin-1 range (code points 0 - 255), the JavaScript charCodeAt method returns code points up to 65535 (the Basic Multilingual Plane). Characters like emoji that use surrogate pairs (above U+FFFF) are encoded as their surrogate pair numeric references. For full Unicode fidelity in such edge cases, consider using codePointAt - this tool handles standard BMP characters correctly.
Control characters like LF (code point 10), CR (13), and TAB (9) serve as whitespace formatting in the source. Encoding them as would produce valid HTML but destroy the readability of your source code. These characters have no rendering impact in HTML (browsers collapse them to spaces), so encoding them adds no safety benefit.
Yes, with caveats. Selective and full modes both encode the double quote (") and ampersand (&), which are the two critical characters inside attribute values. However, if your attribute uses single-quote delimiters, ensure the apostrophe (') is also encoded - selective mode handles this. Always validate your output against the W3C Markup Validation Service for production use.
PHP's htmlspecialchars() with the ENT_QUOTES flag is equivalent to this tool's selective mode. The key difference: this converter runs entirely in your browser - no data is transmitted to a server, which matters when encoding sensitive content like API keys or user credentials destined for HTML config files. This tool also offers full-encode mode and hex output, which htmlspecialchars() does not provide natively.