About

UTF-8 encodes each Unicode codepoint into 1 to 4 bytes using a variable-length scheme defined in RFC 3629. A naive per-character conversion ignoring multi-byte sequences will corrupt any text outside ASCII - emoji, CJK ideographs, Cyrillic, Arabic, and diacritics all require 2 - 4 bytes. This tool performs real byte-level encoding via the browser's native TextEncoder and TextDecoder APIs with fatal mode enabled, so malformed sequences produce an explicit error rather than silent replacement characters (U+FFFD). Feed it raw binary octets and receive correct Unicode text, or paste any Unicode string and receive its exact UTF-8 binary representation.

Limitations: input is processed as a UTF-8 byte stream. If your binary represents UTF-16 or a legacy encoding (ISO-8859-1, Shift_JIS), the output will be incorrect. The tool assumes well-formed 8-bit aligned input. Partial bytes are rejected. Maximum recommended input is approximately 1 MB of text to avoid browser tab memory pressure.

Formulas

UTF-8 encoding maps a Unicode codepoint U to a variable-length byte sequence. The number of bytes n is determined by the codepoint range:

n = {

1 if U ≤ 007F₁₆2 if 0080₁₆ ≤ U ≤ 07FF₁₆3 if 0800₁₆ ≤ U ≤ FFFF₁₆4 if 10000₁₆ ≤ U ≤ 10FFFF₁₆

For decoding binary to UTF-8 text, each byte B is examined. The leading bits of the first byte determine the sequence length:

0xxxxxxx → 1-byte (ASCII)
110xxxxx 10xxxxxx → 2-byte
1110xxxx 10xxxxxx 10xxxxxx → 3-byte
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx → 4-byte

The conversion from a single binary octet string to its decimal byte value uses positional notation:

byte = 7∑i=0 b_i ⋅ 2ⁱ

Where b_i is the bit at position i (counting from the right, starting at 0). The tool collects all bytes into a Uint8Array and passes the entire array to TextDecoder with fatal: true, which rejects overlong encodings, surrogate halves, and truncated sequences per the Unicode standard.

Reference Data

Byte Count	Codepoint Range	Byte 1	Byte 2	Byte 3	Byte 4	Bits for Codepoint	Example Char	Example Codepoint	Example Binary (UTF-8 Bytes)
1	U+0000 - U+007F	0xxxxxxx	-	-	-	7	A	U+0041	01000001
2	U+0080 - U+07FF	110xxxxx	10xxxxxx	-	-	11	é	U+00E9	11000011 10101001
3	U+0800 - U+FFFF	1110xxxx	10xxxxxx	10xxxxxx	-	16	中	U+4E2D	11100100 10111000 10101101
4	U+10000 - U+10FFFF	11110xxx	10xxxxxx	10xxxxxx	10xxxxxx	21	😀	U+1F600	11110000 10011111 10011000 10000000
ASCII Control Characters (Common)
1	U+0000	00000000	-	-	-	7	NUL	U+0000	00000000
1	U+000A	00001010	-	-	-	7	LF (\n)	U+000A	00001010
1	U+000D	00001101	-	-	-	7	CR (\r)	U+000D	00001101
1	U+0020	00100000	-	-	-	7	Space	U+0020	00100000
Multi-byte Symbols & Scripts
2	U+00A9	11000010	10101001	-	-	11	©	U+00A9	11000010 10101001
2	U+03B1	11001110	10110001	-	-	11	α	U+03B1	11001110 10110001
2	U+0414	11010000	10010100	-	-	11	Д	U+0414	11010000 10010100
3	U+20AC	11100010	10000010	10101100	-	16	€	U+20AC	11100010 10000010 10101100
3	U+3042	11100011	10000001	10000010	-	16	あ	U+3042	11100011 10000001 10000010
4	U+1F4A9	11110000	10011111	10010010	10101001	21	💩	U+1F4A9	11110000 10011111 10010010 10101001
4	U+1D11E	11110000	10011101	10000100	10011110	21	𝄞	U+1D11E	11110000 10011101 10000100 10011110
Invalid / Edge Cases
-	Continuation byte without leading byte					-	ERROR	-	10000001
-	Overlong encoding (2-byte for ASCII)					-	ERROR	-	11000000 10100001
-	Surrogate half (U+D800 - U+DFFF)					-	ERROR	-	11101101 10100000 10000000

Frequently Asked Questions

The total number of binary digits (after removing spaces and prefixes) must be a multiple of 8. UTF-8 operates on whole bytes (8 bits each). If your input has, say, 13 bits, the tool cannot form complete bytes and will reject it. Pad with leading zeros to reach a multiple of 8. Additionally, even with correct byte alignment, the byte sequence must form valid UTF-8: a leading byte like 11000000 requires exactly one continuation byte (10xxxxxx) to follow.

The tool decodes the entire binary input as a raw byte stream using the browser's native TextDecoder API set to UTF-8. It does not decode character-by-character. This means a 4-byte emoji like 😀 (U+1F600) is correctly reconstructed from its 32 binary digits: 11110000 10011111 10011000 10000000. In reverse, TextEncoder produces the correct multi-byte representation automatically.

ASCII covers only codepoints U+0000 to U+007F - 128 characters, each exactly 1 byte. UTF-8 is a superset: ASCII characters remain 1 byte and identical in both encodings. The difference appears with characters above U+007F. For example, the letter "é" (U+00E9) requires 2 bytes in UTF-8 (11000011 10101001). A tool limited to ASCII would either fail or produce garbage for non-ASCII input.

Yes. The tool strips 0b prefixes, spaces, commas, tabs, newlines, and pipe characters before processing. Input like 0b01001000 0b01101001 is cleaned to 0100100001101001 and decoded normally to "Hi". Any character that is not 0 or 1 after prefix removal is treated as a separator and removed.

The TextDecoder is instantiated with fatal: true. Per the Encoding Standard (WHATWG), this throws a TypeError on any malformed sequence instead of inserting the replacement character U+FFFD. Overlong encodings (e.g., encoding ASCII "A" as 2 bytes 11000001 10000001) and surrogate halves (U+D800 - U+DFFF) are both rejected with a clear error message.

The tool enforces a soft limit of approximately 1 MB of input text (roughly 8 million binary digits). Beyond this, browser tabs may experience memory pressure or UI lag. For binary-to-text conversion, 1 MB of binary digits represents roughly 125 KB of decoded text, which is sufficient for most practical use cases.