Binary to Unicode Converter
Convert binary numbers to Unicode text and back. Supports UTF-8, UTF-16, and UTF-32 encoding with delimiter detection and real-time preview.
| # | Char | Code Point | Binary | Hex Bytes |
|---|
About
Binary-to-Unicode conversion is not a trivial bit-flip. A single misaligned byte in a UTF-8 stream corrupts every subsequent character. UTF-8 uses variable-width encoding: ASCII maps to 7 bits, but a Chinese character like "ζΌ’" requires 3 bytes (24 bits), and emoji like "π" demand 4 bytes (32 bits). This tool implements the full RFC 3629 (UTF-8), RFC 2781 (UTF-16 with surrogate pair handling for code points above U+FFFF), and UTF-32 decoding pipelines. It validates continuation byte patterns, rejects overlong encodings, and flags orphaned surrogates instead of silently producing garbage.
Manual conversion with a calculator is error-prone because you must track byte boundaries, endianness, and multi-byte state across potentially thousands of bits. A miscount of even one bit shifts the entire frame. This converter auto-detects delimiters, validates binary input structure against the selected encoding, and reports the exact position of malformed sequences. It also operates in reverse: paste any Unicode text to get its precise binary representation in your chosen encoding. Limitations: this tool assumes big-endian byte order for UTF-16 and UTF-32. It does not handle Byte Order Marks (BOM) in input.
Formulas
UTF-8 encodes code points into 1 to 4 bytes. The leading byte determines the sequence length. Continuation bytes always match the pattern 10xxxxxx.
For UTF-16 surrogate pairs (code points above U+FFFF):
High Surrogate = 0xD800 + (Uβ² >> 10)
Low Surrogate = 0xDC00 + (Uβ² & 0x3FF)
Where U is the Unicode code point (hexadecimal). The x bits in the UTF-8 template are filled with the binary representation of the code point, most significant bit first. UTF-32 is a direct 32-bit representation of the code point with zero-padding.
Reference Data
| Character | Code Point | UTF-8 Binary | UTF-8 Bytes | UTF-16 Binary | UTF-16 Units | UTF-32 Binary |
|---|---|---|---|---|---|---|
| A | U+0041 | 01000001 | 1 | 00000000 01000001 | 1 | 00000000 00000000 00000000 01000001 |
| Γ© | U+00E9 | 11000011 10101001 | 2 | 00000000 11101001 | 1 | 00000000 00000000 00000000 11101001 |
| β¬ | U+20AC | 11100010 10000010 10101100 | 3 | 00100000 10101100 | 1 | 00000000 00000000 00100000 10101100 |
| ζΌ’ | U+6F22 | 11100110 10111100 10100010 | 3 | 01101111 00100010 | 1 | 00000000 00000000 01101111 00100010 |
| π | U+10348 | 11110000 10010000 10001101 10001000 | 4 | 11011000 00000000 11011111 01001000 | 2 (surrogate pair) | 00000000 00000001 00000011 01001000 |
| π | U+1F680 | 11110000 10011111 10011010 10000000 | 4 | 11011000 00111101 11011110 10000000 | 2 (surrogate pair) | 00000000 00000001 11110110 10000000 |
| β | U+2660 | 11100010 10011001 10100000 | 3 | 00100110 01100000 | 1 | 00000000 00000000 00100110 01100000 |
| Ξ© | U+03A9 | 11001110 10101001 | 2 | 00000011 10101001 | 1 | 00000000 00000000 00000011 10101001 |
| β | U+2192 | 11100010 10000110 10010010 | 3 | 00100001 10010010 | 1 | 00000000 00000000 00100001 10010010 |
| β | U+221E | 11100010 10001000 10011110 | 3 | 00100010 00011110 | 1 | 00000000 00000000 00100010 00011110 |
| NUL | U+0000 | 00000000 | 1 | 00000000 00000000 | 1 | 00000000 00000000 00000000 00000000 |
| DEL | U+007F | 01111111 | 1 | 00000000 01111111 | 1 | 00000000 00000000 00000000 01111111 |
| Space | U+0020 | 00100000 | 1 | 00000000 00100000 | 1 | 00000000 00000000 00000000 00100000 |
| Β© | U+00A9 | 11000010 10101001 | 2 | 00000000 10101001 | 1 | 00000000 00000000 00000000 10101001 |
| δΈ | U+4E2D | 11100100 10111000 10101101 | 3 | 01001110 00101101 | 1 | 00000000 00000000 01001110 00101101 |
| π΅ | U+1F3B5 | 11110000 10011111 10001110 10110101 | 4 | 11011000 00111100 11011111 10110101 | 2 (surrogate pair) | 00000000 00000001 11110011 10110101 |