User Rating 0.0
Total Usage 0 times
0 characters
Byte Breakdown
Quick Presets:
Is this tool helpful?

Your feedback helps us improve.

About

UTF-8 encodes each Unicode codepoint into 1 to 4 bytes using a variable-length scheme defined in RFC 3629. A naive per-character conversion ignoring multi-byte sequences will corrupt any text outside ASCII - emoji, CJK ideographs, Cyrillic, Arabic, and diacritics all require 2 - 4 bytes. This tool performs real byte-level encoding via the browser's native TextEncoder and TextDecoder APIs with fatal mode enabled, so malformed sequences produce an explicit error rather than silent replacement characters (U+FFFD). Feed it raw binary octets and receive correct Unicode text, or paste any Unicode string and receive its exact UTF-8 binary representation.

Limitations: input is processed as a UTF-8 byte stream. If your binary represents UTF-16 or a legacy encoding (ISO-8859-1, Shift_JIS), the output will be incorrect. The tool assumes well-formed 8-bit aligned input. Partial bytes are rejected. Maximum recommended input is approximately 1 MB of text to avoid browser tab memory pressure.

binary converter utf-8 decoder binary to text text to binary unicode converter binary translator utf8 encoding

Formulas

UTF-8 encoding maps a Unicode codepoint U to a variable-length byte sequence. The number of bytes n is determined by the codepoint range:

n = {
1 if U 007F162 if 008016 U 07FF163 if 080016 U FFFF164 if 1000016 U 10FFFF16

For decoding binary to UTF-8 text, each byte B is examined. The leading bits of the first byte determine the sequence length:

0xxxxxxx 1-byte (ASCII)
110xxxxx 10xxxxxx 2-byte
1110xxxx 10xxxxxx 10xxxxxx 3-byte
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 4-byte

The conversion from a single binary octet string to its decimal byte value uses positional notation:

byte = 7i=0 bi 2i

Where bi is the bit at position i (counting from the right, starting at 0). The tool collects all bytes into a Uint8Array and passes the entire array to TextDecoder with fatal: true, which rejects overlong encodings, surrogate halves, and truncated sequences per the Unicode standard.

Reference Data

Byte CountCodepoint RangeByte 1Byte 2Byte 3Byte 4Bits for CodepointExample CharExample CodepointExample Binary (UTF-8 Bytes)
1U+0000 - U+007F0xxxxxxx - - - 7AU+004101000001
2U+0080 - U+07FF110xxxxx10xxxxxx - - 11éU+00E911000011 10101001
3U+0800 - U+FFFF1110xxxx10xxxxxx10xxxxxx - 16U+4E2D11100100 10111000 10101101
4U+10000 - U+10FFFF11110xxx10xxxxxx10xxxxxx10xxxxxx21😀U+1F60011110000 10011111 10011000 10000000
ASCII Control Characters (Common)
1U+000000000000 - - - 7NULU+000000000000
1U+000A00001010 - - - 7LF (\n)U+000A00001010
1U+000D00001101 - - - 7CR (\r)U+000D00001101
1U+002000100000 - - - 7SpaceU+002000100000
Multi-byte Symbols & Scripts
2U+00A91100001010101001 - - 11©U+00A911000010 10101001
2U+03B11100111010110001 - - 11αU+03B111001110 10110001
2U+04141101000010010100 - - 11ДU+041411010000 10010100
3U+20AC111000101000001010101100 - 16U+20AC11100010 10000010 10101100
3U+3042111000111000000110000010 - 16U+304211100011 10000001 10000010
4U+1F4A91111000010011111100100101010100121💩U+1F4A911110000 10011111 10010010 10101001
4U+1D11E1111000010011101100001001001111021𝄞U+1D11E11110000 10011101 10000100 10011110
Invalid / Edge Cases
- Continuation byte without leading byte - ERROR - 10000001
- Overlong encoding (2-byte for ASCII) - ERROR - 11000000 10100001
- Surrogate half (U+D800 - U+DFFF) - ERROR - 11101101 10100000 10000000

Frequently Asked Questions

The total number of binary digits (after removing spaces and prefixes) must be a multiple of 8. UTF-8 operates on whole bytes (8 bits each). If your input has, say, 13 bits, the tool cannot form complete bytes and will reject it. Pad with leading zeros to reach a multiple of 8. Additionally, even with correct byte alignment, the byte sequence must form valid UTF-8: a leading byte like 11000000 requires exactly one continuation byte (10xxxxxx) to follow.
The tool decodes the entire binary input as a raw byte stream using the browser's native TextDecoder API set to UTF-8. It does not decode character-by-character. This means a 4-byte emoji like 😀 (U+1F600) is correctly reconstructed from its 32 binary digits: 11110000 10011111 10011000 10000000. In reverse, TextEncoder produces the correct multi-byte representation automatically.
ASCII covers only codepoints U+0000 to U+007F - 128 characters, each exactly 1 byte. UTF-8 is a superset: ASCII characters remain 1 byte and identical in both encodings. The difference appears with characters above U+007F. For example, the letter "é" (U+00E9) requires 2 bytes in UTF-8 (11000011 10101001). A tool limited to ASCII would either fail or produce garbage for non-ASCII input.
Yes. The tool strips 0b prefixes, spaces, commas, tabs, newlines, and pipe characters before processing. Input like 0b01001000 0b01101001 is cleaned to 0100100001101001 and decoded normally to "Hi". Any character that is not 0 or 1 after prefix removal is treated as a separator and removed.
The TextDecoder is instantiated with fatal: true. Per the Encoding Standard (WHATWG), this throws a TypeError on any malformed sequence instead of inserting the replacement character U+FFFD. Overlong encodings (e.g., encoding ASCII "A" as 2 bytes 11000001 10000001) and surrogate halves (U+D800 - U+DFFF) are both rejected with a clear error message.
The tool enforces a soft limit of approximately 1 MB of input text (roughly 8 million binary digits). Beyond this, browser tabs may experience memory pressure or UI lag. For binary-to-text conversion, 1 MB of binary digits represents roughly 125 KB of decoded text, which is sufficient for most practical use cases.