Binary to UTF-8 Converter
Convert binary strings to UTF-8 text and UTF-8 text to binary representation. Handles multi-byte Unicode characters with real TextEncoder/TextDecoder.
About
UTF-8 encodes each Unicode codepoint into 1 to 4 bytes using a variable-length scheme defined in RFC 3629. A naive per-character conversion ignoring multi-byte sequences will corrupt any text outside ASCII - emoji, CJK ideographs, Cyrillic, Arabic, and diacritics all require 2 - 4 bytes. This tool performs real byte-level encoding via the browser's native TextEncoder and TextDecoder APIs with fatal mode enabled, so malformed sequences produce an explicit error rather than silent replacement characters (U+FFFD). Feed it raw binary octets and receive correct Unicode text, or paste any Unicode string and receive its exact UTF-8 binary representation.
Limitations: input is processed as a UTF-8 byte stream. If your binary represents UTF-16 or a legacy encoding (ISO-8859-1, Shift_JIS), the output will be incorrect. The tool assumes well-formed 8-bit aligned input. Partial bytes are rejected. Maximum recommended input is approximately 1 MB of text to avoid browser tab memory pressure.
Formulas
UTF-8 encoding maps a Unicode codepoint U to a variable-length byte sequence. The number of bytes n is determined by the codepoint range:
For decoding binary to UTF-8 text, each byte B is examined. The leading bits of the first byte determine the sequence length:
110xxxxx 10xxxxxx → 2-byte
1110xxxx 10xxxxxx 10xxxxxx → 3-byte
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx → 4-byte
The conversion from a single binary octet string to its decimal byte value uses positional notation:
Where bi is the bit at position i (counting from the right, starting at 0). The tool collects all bytes into a Uint8Array and passes the entire array to TextDecoder with fatal: true, which rejects overlong encodings, surrogate halves, and truncated sequences per the Unicode standard.
Reference Data
| Byte Count | Codepoint Range | Byte 1 | Byte 2 | Byte 3 | Byte 4 | Bits for Codepoint | Example Char | Example Codepoint | Example Binary (UTF-8 Bytes) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | U+0000 - U+007F | 0xxxxxxx | - | - | - | 7 | A | U+0041 | 01000001 |
| 2 | U+0080 - U+07FF | 110xxxxx | 10xxxxxx | - | - | 11 | é | U+00E9 | 11000011 10101001 |
| 3 | U+0800 - U+FFFF | 1110xxxx | 10xxxxxx | 10xxxxxx | - | 16 | 中 | U+4E2D | 11100100 10111000 10101101 |
| 4 | U+10000 - U+10FFFF | 11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx | 21 | 😀 | U+1F600 | 11110000 10011111 10011000 10000000 |
| ASCII Control Characters (Common) | |||||||||
| 1 | U+0000 | 00000000 | - | - | - | 7 | NUL | U+0000 | 00000000 |
| 1 | U+000A | 00001010 | - | - | - | 7 | LF (\n) | U+000A | 00001010 |
| 1 | U+000D | 00001101 | - | - | - | 7 | CR (\r) | U+000D | 00001101 |
| 1 | U+0020 | 00100000 | - | - | - | 7 | Space | U+0020 | 00100000 |
| Multi-byte Symbols & Scripts | |||||||||
| 2 | U+00A9 | 11000010 | 10101001 | - | - | 11 | © | U+00A9 | 11000010 10101001 |
| 2 | U+03B1 | 11001110 | 10110001 | - | - | 11 | α | U+03B1 | 11001110 10110001 |
| 2 | U+0414 | 11010000 | 10010100 | - | - | 11 | Д | U+0414 | 11010000 10010100 |
| 3 | U+20AC | 11100010 | 10000010 | 10101100 | - | 16 | € | U+20AC | 11100010 10000010 10101100 |
| 3 | U+3042 | 11100011 | 10000001 | 10000010 | - | 16 | あ | U+3042 | 11100011 10000001 10000010 |
| 4 | U+1F4A9 | 11110000 | 10011111 | 10010010 | 10101001 | 21 | 💩 | U+1F4A9 | 11110000 10011111 10010010 10101001 |
| 4 | U+1D11E | 11110000 | 10011101 | 10000100 | 10011110 | 21 | 𝄞 | U+1D11E | 11110000 10011101 10000100 10011110 |
| Invalid / Edge Cases | |||||||||
| - | Continuation byte without leading byte | - | ERROR | - | 10000001 | ||||
| - | Overlong encoding (2-byte for ASCII) | - | ERROR | - | 11000000 10100001 | ||||
| - | Surrogate half (U+D800 - U+DFFF) | - | ERROR | - | 11101101 10100000 10000000 | ||||