ASCII to UTF-8 Converter
Convert ASCII text to UTF-8 encoding with hex dump, byte analysis, and codepoint breakdown. Supports bidirectional conversion and file upload.
| Char | Codepoint | Dec | UTF-8 Hex | Binary | Bytes | Type |
|---|
About
ASCII encodes 128 characters in 7 bits. UTF-8 extends this to 1,112,064 valid codepoints using a variable-width scheme of 1 to 4 bytes per character. Every valid ASCII string is already valid UTF-8 because UTF-8 preserves the ASCII range (U+0000 to U+007F) as single-byte sequences. The real complexity arises when text contains characters beyond codepoint 127. A misidentified encoding produces mojibake: garbled output caused by interpreting bytes under the wrong scheme. This tool performs real encoding via the browser's native TextEncoder API, exposes the raw byte structure, and flags every non-ASCII character so you can diagnose encoding issues before they corrupt a database or break a data pipeline.
Limitations: this tool operates on valid Unicode strings as represented by JavaScript's internal UTF-16. Lone surrogates and byte sequences that do not form valid UTF-8 will be replaced with the replacement character U+FFFD. If you are debugging raw binary files, examine the hex dump output rather than the decoded text. Pro tip: CSV imports fail silently on encoding mismatch. Validate your source encoding here before bulk inserts.
Formulas
UTF-8 encodes Unicode codepoints into variable-length byte sequences. The number of bytes depends on the codepoint range:
The encoding formula extracts the codepoint value cp and distributes its bits into the template above. For a 2-byte character:
Byte 2 = 0x80 β¨ (cp β§ 0x3F)
Where cp = Unicode codepoint (integer). The >> operator performs a bitwise right shift, and β§ masks the lower 6 bits. ASCII characters (codepoint β€ 127) pass through unchanged because their single-byte UTF-8 representation is identical to their ASCII value.
Reference Data
| Character | ASCII Dec | Unicode Codepoint | UTF-8 Bytes (Hex) | UTF-8 Byte Count | Category |
|---|---|---|---|---|---|
| A | 65 | U+0041 | 41 | 1 | Latin uppercase |
| z | 122 | U+007A | 7A | 1 | Latin lowercase |
| 0 | 48 | U+0030 | 30 | 1 | Digit |
| Space | 32 | U+0020 | 20 | 1 | Whitespace |
| ~ | 126 | U+007E | 7E | 1 | Printable (last ASCII) |
| Β’ | - | U+00A2 | C2 A2 | 2 | Currency symbol |
| Β£ | - | U+00A3 | C2 A3 | 2 | Currency symbol |
| β¬ | - | U+20AC | E2 82 AC | 3 | Currency symbol |
| Β© | - | U+00A9 | C2 A9 | 2 | Miscellaneous symbol |
| Β° | - | U+00B0 | C2 B0 | 2 | Miscellaneous symbol |
| ΓΌ | - | U+00FC | C3 BC | 2 | Latin extended |
| Γ± | - | U+00F1 | C3 B1 | 2 | Latin extended |
| Ξ± | - | U+03B1 | CE B1 | 2 | Greek |
| Ξ© | - | U+03A9 | CE A9 | 2 | Greek |
| δΈ | - | U+4E16 | E4 B8 96 | 3 | CJK Ideograph |
| Π | - | U+0410 | D0 90 | 2 | Cyrillic |
| β | - | U+2603 | E2 98 83 | 3 | Miscellaneous symbol |
| π | - | U+1F600 | F0 9F 98 80 | 4 | Emoji (supplementary) |
| π© | - | U+1F4A9 | F0 9F 92 A9 | 4 | Emoji (supplementary) |
| NUL | 0 | U+0000 | 00 | 1 | Control character |
| TAB | 9 | U+0009 | 09 | 1 | Control character |
| LF | 10 | U+000A | 0A | 1 | Control character |
| CR | 13 | U+000D | 0D | 1 | Control character |
| DEL | 127 | U+007F | 7F | 1 | Control character |
| BOM | - | U+FEFF | EF BB BF | 3 | Byte Order Mark |