About

Misinterpreting a single byte encoding destroys data. A file saved as UTF-8 but read as Latin-1 replaces every multibyte character with garbage - mojibake - and the original content may be unrecoverable. This tool converts raw byte sequences, expressed as decimal, hexadecimal, octal, or binary values, into human-readable strings using the exact encoding you specify: UTF-8, ASCII (0 - 127), ISO-8859-1 (Latin-1, 0 - 255), or UTF-16. It also operates in reverse: paste a string, receive its byte representation in any base.

The converter uses the browser's native TextDecoder and TextEncoder APIs, which implement the WHATWG Encoding Standard. Results match the behavior of production runtimes. Note: ASCII strict mode rejects any byte above 127. UTF-8 sequences with invalid continuation bytes produce the replacement character U+FFFD rather than silent corruption. For UTF-16, byte order matters - select the correct endianness or expect swapped characters.

Formulas

Byte-to-string conversion applies a decoding function D that maps an ordered sequence of byte values to Unicode code points, then renders those code points as characters.

S = D_enc(b₀, b₁, …, b_n−1)

Where S is the output string, D_enc is the decoder for encoding enc, and each b_i is a byte value in range 0 - 255.

Input parsing converts a text token t in base r to its integer byte value:

b = parseInt(t, r) where r ∈ {2, 8, 10, 16}

For UTF-8, a code point U determines how many bytes encode it:

{

1 byte if U ≤ 0x7F2 bytes if U ≤ 0x7FF3 bytes if U ≤ 0xFFFF4 bytes if U ≤ 0x10FFFF

For the reverse operation (String → Bytes), the TextEncoder API produces UTF-8 byte sequences. For single-byte encodings, each character's code point maps directly: b = charCodeAt(i).

Where b = byte value, t = input token string, r = radix (number base), U = Unicode code point, n = total byte count, S = decoded output string, enc = encoding label.

Reference Data

Encoding	Byte Range	Max Bytes/Char	BOM	Standard	Typical Use
ASCII	0 - 127	1	None	ANSI X3.4-1968	Legacy protocols, RFC headers
ISO-8859-1 (Latin-1)	0 - 255	1	None	ISO/IEC 8859-1:1998	Western European text, HTTP default
UTF-8	0 - 255	4	EF BB BF (optional)	RFC 3629	Web (93%+ of pages), JSON, XML
UTF-16 LE	0 - 255	4	FF FE	RFC 2781	Windows internals, Java strings
UTF-16 BE	0 - 255	4	FE FF	RFC 2781	Network byte order, older Mac OS
Windows-1252	0 - 255	1	None	Microsoft	Legacy Windows apps, emails
Common Byte Representations
Decimal	Base 10		Example: 72 101 108 108 111 → "Hello"
Hexadecimal	Base 16		Example: 48 65 6C 6C 6F → "Hello"
Octal	Base 8		Example: 110 145 154 154 157 → "Hello"
Binary	Base 2		Example: 01001000 01100101 → "He"
UTF-8 Multibyte Structure
0xxxxxxx	1 byte		U+0000 - U+007F (ASCII compatible)
110xxxxx 10xxxxxx	2 bytes		U+0080 - U+07FF (Latin, Greek, Cyrillic)
1110xxxx 10xxxxxx 10xxxxxx	3 bytes		U+0800 - U+FFFF (CJK, most BMP)
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx	4 bytes		U+10000 - U+10FFFF (Emoji, rare scripts)
Printable ASCII Quick Reference
32 (20_h)	Space		48 - 57	Digits 0-9
65 - 90	A - Z uppercase		97 - 122	a - z lowercase
33 - 47	Punctuation set 1		123 - 126	{ \| } ~

Frequently Asked Questions

The WHATWG Encoding Standard mandates that a conforming UTF-8 decoder replaces each malformed subsequence with the Unicode Replacement Character U+FFFD (�). This tool uses the browser's native TextDecoder with fatal mode disabled, so you will see � symbols in the output wherever bytes violate the UTF-8 structure - for example, a lone continuation byte (0x80 - 0xBF) or an incomplete multibyte sequence. Enable strict ASCII mode if you need to reject any byte above 127 outright.

UTF-16 encodes each code point as one or two 16-bit code units. A 16-bit unit requires two bytes, and their order depends on endianness. In Little Endian (UTF-16 LE), the least significant byte comes first: the character "A" (U+0041) is stored as bytes 41 00. In Big Endian (UTF-16 BE), the same character is 00 41. Selecting the wrong endianness typically produces CJK characters or null bytes instead of the expected Latin text. If your byte source includes a BOM (FF FE or FE FF), use it to determine the correct setting.

ASCII defines only 128 characters (bytes 0 - 127). Bytes 128 - 255 are undefined in ASCII and will either be rejected or produce the replacement character. ISO-8859-1 (Latin-1) extends this range to 256 characters, mapping bytes 128 - 255 to accented Latin characters, currency symbols, and typographic marks. For example, byte 233 is invalid in strict ASCII but decodes to "é" in Latin-1. Windows-1252 further reassigns bytes 128 - 159 (which are control characters in Latin-1) to printable glyphs like ", ", and -.

Technically, any byte sequence can be fed through a text decoder, but non-text binary data (images, compressed archives, executables) will produce meaningless character sequences full of replacement characters and control codes. The output will not be useful. If you need to inspect raw binary, use the hex output format in String-to-Bytes mode instead - it preserves every byte faithfully without interpretation loss. This tool is designed for byte sequences that genuinely represent encoded text.

The converter auto-detects common separators: spaces (72 101 108), commas (72,101,108), comma-space (72, 101, 108), and newlines. For hexadecimal input, you may also use the 0x prefix notation (0x48 0x65 0x6C) or no separator with consistent two-character grouping (48656C). Binary values should be space-separated with consistent 8-bit grouping. Mixing separators within one input may cause parsing errors.

The converter processes byte arrays synchronously in the main thread. For typical use cases under 100,000 bytes (approximately 100 KB), conversion is instantaneous. Inputs up to 500,000 bytes remain responsive on modern hardware. Beyond that, the textarea rendering itself becomes the bottleneck. The tool enforces a soft limit of 500 KB of input text and will display a warning if exceeded. For very large files, consider a command-line tool like xxd or iconv.