User Rating 0.0 โ˜…โ˜…โ˜…โ˜…โ˜…
Total Usage 0 times
Examples:
Input Bytes
0 bytes parsed
Output Text
0 characters
Is this tool helpful?

Your feedback helps us improve.

โ˜… โ˜… โ˜… โ˜… โ˜…

About

Incorrect byte interpretation corrupts data silently. A single misidentified encoding transforms valid text into mojibake - garbled characters that propagate through databases, APIs, and file systems. This tool decodes raw byte sequences (hex, decimal, binary, octal) into Unicode characters using the actual TextDecoder API with support for UTF-8, UTF-16LE, UTF-16BE, ISO-8859-1, Windows-1252, and ASCII. It parses multi-byte sequences correctly, handles surrogate pairs in UTF-16, and reports each character's codepoint (U+XXXX), byte length, and category. The tool assumes well-formed input per the selected encoding. Malformed sequences produce the replacement character U+FFFD rather than failing silently.

Pro tip: if you receive bytes from a network protocol or binary file and the output looks wrong, try ISO-8859-1 first - it maps every single byte (0x00 - 0xFF) to a valid codepoint, confirming your raw data before re-decoding with the correct encoding. Remember that UTF-8 uses 1 - 4 bytes per character. Feeding a UTF-16 stream to a UTF-8 decoder will produce nonsense, not an error.

bytes to unicode hex to text byte converter utf-8 decoder unicode converter binary to text character encoding

Formulas

Byte-to-character decoding follows the encoding's mapping rules. For UTF-8, the byte length of a character is determined by the leading byte pattern:

{
1 byte: 0xxxxxxx โ†’ U+0000 - U+007F2 bytes: 110xxxxx 10xxxxxx โ†’ U+0080 - U+07FF3 bytes: 1110xxxx 10xxxxxx 10xxxxxx โ†’ U+0800 - U+FFFF4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx โ†’ U+10000 - U+10FFFF

The codepoint cp is extracted by masking the payload bits and shifting. For a 2-byte sequence with lead byte b0 and continuation byte b1:

cp = (b0 โˆง 0x1F) << 6 + (b1 โˆง 0x3F)

For UTF-16, codepoints above U+FFFF use surrogate pairs. High surrogate H and low surrogate L reconstruct the codepoint:

cp = (H โˆ’ 0xD800) ร— 0x400 + (L โˆ’ 0xDC00) + 0x10000

Where H โˆˆ [0xD800, 0xDBFF] and L โˆˆ [0xDC00, 0xDFFF]. Single-byte encodings (ISO-8859-1, Windows-1252) use a direct 1:1 lookup table mapping each byte value to its codepoint.

Reference Data

EncodingByte RangeMax Bytes/CharBOMCoverageCommon Use
ASCII0x00 - 0x7F1None128 charactersLegacy protocols, plain English text
UTF-80x00 - 0xF4 (lead)40xEF 0xBB 0xBFAll Unicode (1,114,112 codepoints)Web (HTML, JSON, XML), Linux, modern APIs
UTF-16LE0x00 - 0xFF per byte40xFF 0xFEAll UnicodeWindows internals, Java strings, .NET
UTF-16BE0x00 - 0xFF per byte40xFE 0xFFAll UnicodeNetwork protocols (big-endian default)
ISO-8859-10x00 - 0xFF1None256 charactersWestern European, HTTP default fallback
Windows-12520x00 - 0xFF1None251 charactersLegacy Windows apps, email, old Word docs
UTF-32LE4 bytes fixed40xFF 0xFE 0x00 0x00All UnicodeInternal processing, Python 3 (CPython)
UTF-32BE4 bytes fixed40x00 0x00 0xFE 0xFFAll UnicodeRare, academic use
KOI8-R0x00 - 0xFF1NoneRussian CyrillicUnix/Linux Russian locale, older email
Shift_JIS0x00 - 0xFF2NoneJIS X 0208 + ASCIIJapanese Windows, legacy web pages
EUC-KR0x00 - 0xFF2NoneKS X 1001 + ASCIIKorean legacy systems
GB2312 / GBK0x00 - 0xFF2NoneSimplified ChineseChinese websites, legacy systems
Big50x00 - 0xFF2NoneTraditional ChineseTaiwan/HK legacy systems
ISO-8859-50x00 - 0xFF1NoneCyrillic subsetEastern European legacy
ISO-8859-150x00 - 0xFF1None256 characters (โ‚ฌ sign added)Updated Western European

Frequently Asked Questions

The TextDecoder API inserts the Unicode Replacement Character U+FFFD (๏ฟฝ) at each position where a malformed or incomplete sequence is detected. This is the standard behavior defined by the Unicode specification. If you see clusters of ๏ฟฝ in the output, your bytes likely belong to a different encoding than the one selected. Try ISO-8859-1 first to see the raw byte-to-codepoint mapping, then switch to the correct encoding.
If your input begins with BOM bytes (e.g., 0xEF 0xBB 0xBF for UTF-8, or 0xFF 0xFE for UTF-16LE), the TextDecoder will consume them silently and not include the BOM character in the output. This matches real-world decoding behavior. If you explicitly want to see the BOM character U+FEFF, select ISO-8859-1 encoding which treats every byte independently.
Each encoding defines its own mapping table from byte values to Unicode codepoints. For example, byte 0xE4 maps to U+00E4 (รค) in ISO-8859-1 but is a UTF-8 lead byte indicating a 3-byte sequence starting a CJK character. In Windows-1252, byte 0x80 maps to the Euro sign โ‚ฌ (U+20AC), while ISO-8859-1 defines it as a control character. The encoding is not metadata attached to the bytes - it is an external contract between sender and receiver.
Yes. Emoji such as ๐Ÿ˜€ (Grinning Face) are encoded as 4 bytes in UTF-8: 0xF0 0x9F 0x98 0x80. Enter these bytes in any supported format (hex, decimal, binary) and select UTF-8 encoding. The tool will correctly reconstruct codepoint U+1F600 and display the emoji. In UTF-16, the same character requires a surrogate pair: 0xD83D 0xDE00 (4 bytes total, 2 code units).
The converter accepts four formats, auto-detected by prefix: hexadecimal (0xFF or plain FF), decimal (255), octal (0o377), and binary (0b11111111). Values can be separated by spaces, commas, or newlines. Mixed formats are supported in a single input. Each token must resolve to a value in the range 0 - 255 (0x00 - 0xFF). Values outside this range are flagged as errors.
Use the Reverse Mode toggle. Paste or type Unicode text, select the target encoding, and the tool outputs the corresponding byte sequence in your chosen format (hex, decimal, binary, octal). This is useful for verifying round-trip consistency: convert bytes โ†’ text, then text โ†’ bytes, and confirm the byte sequences match. Note that some characters cannot be represented in single-byte encodings. For example, the character รฆ (U+00E6) exists in ISO-8859-1 but not in ASCII, which will produce an encoding error.