Bytes to Unicode Converter
Convert raw bytes (hex, decimal, binary, octal) to Unicode characters with encoding support for UTF-8, UTF-16, ISO-8859-1, ASCII, and more.
About
Incorrect byte interpretation corrupts data silently. A single misidentified encoding transforms valid text into mojibake - garbled characters that propagate through databases, APIs, and file systems. This tool decodes raw byte sequences (hex, decimal, binary, octal) into Unicode characters using the actual TextDecoder API with support for UTF-8, UTF-16LE, UTF-16BE, ISO-8859-1, Windows-1252, and ASCII. It parses multi-byte sequences correctly, handles surrogate pairs in UTF-16, and reports each character's codepoint (U+XXXX), byte length, and category. The tool assumes well-formed input per the selected encoding. Malformed sequences produce the replacement character U+FFFD rather than failing silently.
Pro tip: if you receive bytes from a network protocol or binary file and the output looks wrong, try ISO-8859-1 first - it maps every single byte (0x00 - 0xFF) to a valid codepoint, confirming your raw data before re-decoding with the correct encoding. Remember that UTF-8 uses 1 - 4 bytes per character. Feeding a UTF-16 stream to a UTF-8 decoder will produce nonsense, not an error.
Formulas
Byte-to-character decoding follows the encoding's mapping rules. For UTF-8, the byte length of a character is determined by the leading byte pattern:
The codepoint cp is extracted by masking the payload bits and shifting. For a 2-byte sequence with lead byte b0 and continuation byte b1:
For UTF-16, codepoints above U+FFFF use surrogate pairs. High surrogate H and low surrogate L reconstruct the codepoint:
Where H โ [0xD800, 0xDBFF] and L โ [0xDC00, 0xDFFF]. Single-byte encodings (ISO-8859-1, Windows-1252) use a direct 1:1 lookup table mapping each byte value to its codepoint.
Reference Data
| Encoding | Byte Range | Max Bytes/Char | BOM | Coverage | Common Use |
|---|---|---|---|---|---|
| ASCII | 0x00 - 0x7F | 1 | None | 128 characters | Legacy protocols, plain English text |
| UTF-8 | 0x00 - 0xF4 (lead) | 4 | 0xEF 0xBB 0xBF | All Unicode (1,114,112 codepoints) | Web (HTML, JSON, XML), Linux, modern APIs |
| UTF-16LE | 0x00 - 0xFF per byte | 4 | 0xFF 0xFE | All Unicode | Windows internals, Java strings, .NET |
| UTF-16BE | 0x00 - 0xFF per byte | 4 | 0xFE 0xFF | All Unicode | Network protocols (big-endian default) |
| ISO-8859-1 | 0x00 - 0xFF | 1 | None | 256 characters | Western European, HTTP default fallback |
| Windows-1252 | 0x00 - 0xFF | 1 | None | 251 characters | Legacy Windows apps, email, old Word docs |
| UTF-32LE | 4 bytes fixed | 4 | 0xFF 0xFE 0x00 0x00 | All Unicode | Internal processing, Python 3 (CPython) |
| UTF-32BE | 4 bytes fixed | 4 | 0x00 0x00 0xFE 0xFF | All Unicode | Rare, academic use |
| KOI8-R | 0x00 - 0xFF | 1 | None | Russian Cyrillic | Unix/Linux Russian locale, older email |
| Shift_JIS | 0x00 - 0xFF | 2 | None | JIS X 0208 + ASCII | Japanese Windows, legacy web pages |
| EUC-KR | 0x00 - 0xFF | 2 | None | KS X 1001 + ASCII | Korean legacy systems |
| GB2312 / GBK | 0x00 - 0xFF | 2 | None | Simplified Chinese | Chinese websites, legacy systems |
| Big5 | 0x00 - 0xFF | 2 | None | Traditional Chinese | Taiwan/HK legacy systems |
| ISO-8859-5 | 0x00 - 0xFF | 1 | None | Cyrillic subset | Eastern European legacy |
| ISO-8859-15 | 0x00 - 0xFF | 1 | None | 256 characters (โฌ sign added) | Updated Western European |