About

Binary-to-text conversion maps sequences of 0s and 1s to human-readable characters using a defined encoding standard. Each character in UTF-8 occupies 1 to 4 bytes, meaning a single emoji can require 32 bits while a Latin letter needs only 8. Misinterpreting byte boundaries or using the wrong encoding produces garbled output or silent data corruption. This tool uses the browser's native TextDecoder and TextEncoder APIs with full UTF-8 support. It auto-detects delimiters (spaces, commas, continuous streams) and validates that every chunk is a legal binary octet before decoding.

Limitation: this converter assumes well-formed UTF-8 input. Arbitrary binary that does not represent valid UTF-8 code points will produce the Unicode replacement character U+FFFD. For raw binary data inspection, a hex editor is more appropriate. Pro tip: when pasting binary from external sources, watch for invisible Unicode whitespace characters that look like spaces but have different code points.

Formulas

Converting a single ASCII character to its binary representation requires extracting the character's code point and expressing it in base 2, zero-padded to 8 bits.

B = pad(c.toString(2), 8)

Where B is the binary string output, and c is the Unicode code point (integer) of the character. For UTF-8 multi-byte encoding, the encoder maps code points to 1 - 4 bytes following this scheme:

{

1 byte: 0xxxxxxx if c ≤ 007F2 bytes: 110xxxxx 10xxxxxx if c ≤ 07FF3 bytes: 1110xxxx 10xxxxxx 10xxxxxx if c ≤ FFFF4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx if c ≤ 10FFFF

The reverse operation (binary to string) parses each 8-bit group as an unsigned integer: c = 7∑i=0 b_i ⋅ 2ⁱ, where b_i is the bit at position i (LSB at i = 0). The resulting byte array is then decoded as UTF-8.

Reference Data

Character	ASCII Code	Binary (8-bit)	Hex	Description
A	65	01000001	41	Uppercase Latin A
Z	90	01011010	5A	Uppercase Latin Z
a	97	01100001	61	Lowercase Latin a
z	122	01111010	7A	Lowercase Latin z
0	48	00110000	30	Digit zero
9	57	00111001	39	Digit nine
(space)	32	00100000	20	Space character
!	33	00100001	21	Exclamation mark
@	64	01000000	40	At sign
#	35	00100011	23	Number/hash sign
.	46	00101110	2E	Period/full stop
,	44	00101100	2C	Comma
?	63	00111111	3F	Question mark
\n	10	00001010	0A	Line feed (newline)
\t	9	00001001	09	Horizontal tab
&	38	00100110	26	Ampersand
/	47	00101111	2F	Forward slash
\	92	01011100	5C	Backslash
~	126	01111110	7E	Tilde
€	8364	11100010 10000010 10101100	E282AC	Euro sign (3-byte UTF-8)
©	169	11000010 10101001	C2A9	Copyright sign (2-byte UTF-8)
π	960	11001111 10000000	CF80	Greek lowercase pi (2-byte UTF-8)

Frequently Asked Questions

The converter uses the browser's native TextEncoder API which outputs UTF-8 byte sequences. A character like é produces 2 bytes (11000011 10101001), while an emoji like 😀 produces 4 bytes. When converting binary back to text, all bytes are collected into a Uint8Array and decoded with TextDecoder('utf-8'), which correctly reassembles multi-byte sequences. If bytes form an invalid UTF-8 sequence, the Unicode replacement character U+FFFD (�) appears instead of corrupting the output silently.

Each byte requires exactly 8 bits. If the total number of binary digits is not divisible by 8, the converter displays a warning and left-pads the final incomplete chunk with leading zeros. For example, input 1000001 (7 bits) is treated as 01000001, which decodes to the character A. This matches the convention used by most binary-to-text tools, but the warning ensures you know the input was ambiguous.

The converter auto-detects four delimiter formats: space-separated (01001000 01101001), comma-separated (01001000,01101001), tab-separated, and continuous stream with no delimiter (0100100001101001). For continuous streams, the parser chunks from left to right in groups of 8. You can also force a specific delimiter via the dropdown selector if auto-detection picks the wrong format.

Yes. Binary values from 00000000 to 00011111 (decimal 0 - 31) represent ASCII control characters such as NULL, TAB, and LF. The converter decodes them faithfully. However, most control characters are invisible in the output textarea. The byte/character count below the output helps verify they are present. Newlines (code point 10) and tabs (code point 9) are the most commonly visible control characters.

Encoding differences are the most common cause. This tool uses UTF-8 exclusively. Some older tools assume ASCII (ignoring characters above code point 127) or use UTF-16/UCS-2 (where every character takes at least 16 bits). For pure ASCII text (English letters, digits, basic punctuation), all encodings produce identical 8-bit output. Discrepancies appear with characters outside the ASCII range: the Euro sign € is 3 bytes in UTF-8 but 2 bytes in UTF-16.

The converter runs entirely in the browser with no server round-trip. Practical limits depend on available memory. Text areas handle approximately 1 - 2 million characters without lag on modern devices. For binary input, that translates to roughly 125,000 - 250,000 bytes of decoded text (each byte is 8 digits plus a delimiter). If you need to process larger payloads, consider a streaming approach or a dedicated command-line tool.