User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
Examples:
Binary Input
0 bits Β· 0 groups
Unicode Output
Character Breakdown
# Char Code Point Binary Hex Bytes
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

Binary-to-Unicode conversion is not a trivial bit-flip. A single misaligned byte in a UTF-8 stream corrupts every subsequent character. UTF-8 uses variable-width encoding: ASCII maps to 7 bits, but a Chinese character like "ζΌ’" requires 3 bytes (24 bits), and emoji like "πŸš€" demand 4 bytes (32 bits). This tool implements the full RFC 3629 (UTF-8), RFC 2781 (UTF-16 with surrogate pair handling for code points above U+FFFF), and UTF-32 decoding pipelines. It validates continuation byte patterns, rejects overlong encodings, and flags orphaned surrogates instead of silently producing garbage.

Manual conversion with a calculator is error-prone because you must track byte boundaries, endianness, and multi-byte state across potentially thousands of bits. A miscount of even one bit shifts the entire frame. This converter auto-detects delimiters, validates binary input structure against the selected encoding, and reports the exact position of malformed sequences. It also operates in reverse: paste any Unicode text to get its precise binary representation in your chosen encoding. Limitations: this tool assumes big-endian byte order for UTF-16 and UTF-32. It does not handle Byte Order Marks (BOM) in input.

binary to unicode binary converter unicode decoder utf-8 binary binary text converter binary to text unicode encoding

Formulas

UTF-8 encodes code points into 1 to 4 bytes. The leading byte determines the sequence length. Continuation bytes always match the pattern 10xxxxxx.

{
0xxxxxxx if U ≀ 007F (1 byte)110xxxxx 10xxxxxx if U ≀ 07FF (2 bytes)1110xxxx 10xxxxxx 10xxxxxx if U ≀ FFFF (3 bytes)11110xxx 10xxxxxx 10xxxxxx 10xxxxxx if U ≀ 10FFFF (4 bytes)

For UTF-16 surrogate pairs (code points above U+FFFF):

Uβ€² = U βˆ’ 0x10000
High Surrogate = 0xD800 + (Uβ€² >> 10)
Low Surrogate = 0xDC00 + (Uβ€² & 0x3FF)

Where U is the Unicode code point (hexadecimal). The x bits in the UTF-8 template are filled with the binary representation of the code point, most significant bit first. UTF-32 is a direct 32-bit representation of the code point with zero-padding.

Reference Data

CharacterCode PointUTF-8 BinaryUTF-8 BytesUTF-16 BinaryUTF-16 UnitsUTF-32 Binary
AU+004101000001100000000 01000001100000000 00000000 00000000 01000001
Γ©U+00E911000011 10101001200000000 11101001100000000 00000000 00000000 11101001
€U+20AC11100010 10000010 10101100300100000 10101100100000000 00000000 00100000 10101100
ζΌ’U+6F2211100110 10111100 10100010301101111 00100010100000000 00000000 01101111 00100010
𐍈U+1034811110000 10010000 10001101 10001000411011000 00000000 11011111 010010002 (surrogate pair)00000000 00000001 00000011 01001000
πŸš€U+1F68011110000 10011111 10011010 10000000411011000 00111101 11011110 100000002 (surrogate pair)00000000 00000001 11110110 10000000
β™ U+266011100010 10011001 10100000300100110 01100000100000000 00000000 00100110 01100000
Ξ©U+03A911001110 10101001200000011 10101001100000000 00000000 00000011 10101001
β†’U+219211100010 10000110 10010010300100001 10010010100000000 00000000 00100001 10010010
∞U+221E11100010 10001000 10011110300100010 00011110100000000 00000000 00100010 00011110
NULU+000000000000100000000 00000000100000000 00000000 00000000 00000000
DELU+007F01111111100000000 01111111100000000 00000000 00000000 01111111
SpaceU+002000100000100000000 00100000100000000 00000000 00000000 00100000
Β©U+00A911000010 10101001200000000 10101001100000000 00000000 00000000 10101001
δΈ­U+4E2D11100100 10111000 10101101301001110 00101101100000000 00000000 01001110 00101101
🎡U+1F3B511110000 10011111 10001110 10110101411011000 00111100 11011111 101101012 (surrogate pair)00000000 00000001 11110011 10110101

Frequently Asked Questions

UTF-8 uses variable-width encoding. If your binary was generated using UTF-16 (16-bit units) or UTF-32 (32-bit units), selecting UTF-8 will misinterpret byte boundaries. A UTF-16 encoded "A" is 00000000 01000001 (16 bits), but UTF-8 would read those as two separate bytes - the first (00000000) as a NUL character. Always match the encoding to how the binary was originally produced.
Characters above U+FFFF (such as emoji, historic scripts, and mathematical symbols) require 4 bytes in UTF-8 and a surrogate pair (two 16-bit units) in UTF-16. The converter fully implements surrogate pair encoding and decoding per RFC 2781. In UTF-32, these code points are simply zero-padded to 32 bits. If you see a replacement character (U+FFFD), it means the binary sequence was malformed for that encoding.
An overlong encoding represents a code point using more bytes than necessary - for example, encoding U+002F (/) as C0 AF (2 bytes) instead of the correct single byte 2F. This is a security vulnerability (used in directory traversal attacks). This converter rejects overlong encodings and flags them as invalid, per RFC 3629 Section 3.
Yes. Select the "None (fixed-width)" delimiter option. The converter will split the input into chunks based on the encoding: 8-bit chunks for UTF-8, 16-bit for UTF-16, and 32-bit for UTF-32. The total number of bits must be evenly divisible by the chunk size, or the converter will report an error with the exact count mismatch.
The range U+D800 to U+DFFF is permanently reserved for UTF-16 surrogate pairs. These are not valid Unicode scalar values and must never appear as standalone code points. If a UTF-32 binary input decodes to a value in this range, or a UTF-8 stream encodes one of these values, the converter flags it as an error. This range contains exactly 2048 code points that are structurally excluded from the Unicode character set.
This tool assumes big-endian byte order for UTF-16 and UTF-32 inputs. In big-endian, the most significant byte comes first. If your binary data was produced in little-endian format (common on x86 systems), you need to reverse the byte order within each 16-bit or 32-bit unit before conversion. UTF-8 is byte-order independent because it is defined as a stream of individual bytes.