User Rating 0.0 โ˜…โ˜…โ˜…โ˜…โ˜…
Total Usage 0 times
Bytes mode: enter decimal, hex, binary, or octal values
Quick Presets:
Is this tool helpful?

Your feedback helps us improve.

โ˜… โ˜… โ˜… โ˜… โ˜…

About

Raw byte sequences carry no meaning until decoded against a character encoding standard. A single misidentified byte in a UTF-8 stream produces the replacement character U+FFFD and corrupts every subsequent multi-byte codepoint in the sequence. This tool decodes byte arrays - expressed as decimal (0 - 255), hexadecimal (00 - FF), binary (00000000 - 11111111), or octal (0 - 377) - into their corresponding Unicode string using proper UTF-8 multi-byte reassembly. It also reverses the operation: any plaintext string converts back to its constituent byte values in your chosen base.

The converter handles all four UTF-8 byte lengths: single-byte ASCII (U+0000 - U+007F), two-byte sequences up to U+07FF, three-byte sequences covering the Basic Multilingual Plane up to U+FFFF, and four-byte sequences for supplementary planes up to U+10FFFF. Note: this tool assumes valid UTF-8 encoding. Malformed continuation bytes or overlong encodings are flagged as errors rather than silently accepted. Pro tip: when debugging network protocols or binary file headers, paste raw hex dumps directly - the delimiter auto-detection handles spaces, commas, 0x prefixes, and contiguous streams.

bytes to string byte array converter hex to text binary to string UTF-8 decoder byte converter decimal to text

Formulas

UTF-8 encodes Unicode codepoints into variable-length byte sequences. The encoding rule maps a codepoint U to 1 - 4 bytes based on its magnitude:

{
0xxxxxxx if U โ‰ค 0x7F (1 byte, 7 bits)110xxxxx 10xxxxxx if U โ‰ค 0x7FF (2 bytes, 11 bits)1110xxxx 10xxxxxx 10xxxxxx if U โ‰ค 0xFFFF (3 bytes, 16 bits)11110xxx 10xxxxxx 10xxxxxx 10xxxxxx if U โ‰ค 0x10FFFF (4 bytes, 21 bits)

To decode bytes back to a codepoint, extract the payload bits from each byte and concatenate. For a 2-byte sequence with lead byte b0 and continuation byte b1:

U = (b0 โˆง 0x1F) << 6 + (b1 โˆง 0x3F)

For base conversion of individual byte values, a byte B in decimal relates to other bases as:

Bdec = Bhex = Boct = Bbin , where 0 โ‰ค B โ‰ค 255

Where U = Unicode codepoint value, bn = the n-th byte in the UTF-8 sequence, B = a single byte value in range 0 - 255, and โˆง = bitwise AND, << = left bit-shift.

Reference Data

Byte Range (Dec)Hex RangeBinary PatternUTF-8 BytesUnicode RangeDescription
0 - 12700 - 7F0xxxxxxx1U+0000 - U+007FASCII (Latin letters, digits, punctuation)
194 - 223C2 - DF110xxxxx2U+0080 - U+07FFLatin Extended, Greek, Cyrillic, Arabic, Hebrew
224 - 239E0 - EF1110xxxx3U+0800 - U+FFFFCJK, Devanagari, Thai, BMP symbols
240 - 247F0 - F711110xxx4U+10000 - U+10FFFFEmoji, historic scripts, musical symbols
128 - 19180 - BF10xxxxxx - - Continuation byte (never starts a character)
192 - 193C0 - C1110000xx - - Overlong encoding (invalid in UTF-8)
248 - 255F8 - FF11111xxx - - Invalid UTF-8 lead bytes (RFC 3629)
3220001000001U+0020Space character
100A000010101U+000ALine Feed (LF / \n)
130D000011011U+000DCarriage Return (CR / \r)
909000010011U+0009Horizontal Tab (\t)
000000000001U+0000Null character (NUL)
239 187 191EF BB BF - 3U+FEFFUTF-8 BOM (Byte Order Mark)
239 191 189EF BF BD - 3U+FFFDReplacement Character (decoding error)
48 - 5730 - 390011xxxx1U+0030 - U+0039Digits 0-9
65 - 9041 - 5A01xxxxxx1U+0041 - U+005AUppercase A - Z
97 - 12261 - 7A01xxxxxx1U+0061 - U+007ALowercase a - z
240 159 152 128F0 9F 98 80 - 4U+1F600Grinning Face emoji ๐Ÿ˜€

Frequently Asked Questions

If you map each byte individually via Latin-1 or ASCII, any codepoint above U+007F produces wrong characters. UTF-8 uses lead bytes (with prefixes 110, 1110, 11110) to signal that 1-3 continuation bytes (prefix 10) follow. The converter reassembles these into a single codepoint. For example, the Euro sign โ‚ฌ is three bytes: E2 82 AC. Treating them as three Latin-1 characters yields รขย‚ยฌ instead.
The converter flags it as an invalid sequence and inserts the Unicode replacement character U+FFFD (๏ฟฝ). Bytes in the range 0x80 - 0xBF are continuation bytes and must never appear at the start of a character. Similarly, lead bytes 0xC0 - 0xC1 and 0xF8 - 0xFF are invalid per RFC 3629.
Yes. Emoji like ๐Ÿ˜€ (Grinning Face) occupy codepoint U+1F600, which requires a 4-byte UTF-8 sequence: F0 9F 98 80. The converter correctly reassembles all four bytes into the single emoji character. In the reverse direction (string to bytes), JavaScript's surrogate pairs are resolved to the true codepoint before encoding.
The auto-delimiter detection looks for spaces, commas, 0x prefixes, or contiguous pairs. If your hex string has odd-length segments (e.g., A 4F 3), the parser cannot form valid byte pairs. Ensure each hex value is exactly two characters (pad with a leading zero: 0A instead of A). When using the 0x prefix format, each value must be 0x00 - 0xFF.
In Decimal mode, each byte is output as a base-10 integer (0 - 255). In Hex mode, bytes appear as two-character hexadecimal values. Binary mode produces 8-digit binary strings. Octal mode outputs 1-3 digit octal values (range 0 - 377). All modes represent the same underlying UTF-8 byte sequence; only the numeric base changes.
The UTF-8 BOM is the three-byte sequence EF BB BF (codepoint U+FEFF). This converter does not strip it automatically - it decodes to the zero-width no-break space character. If your source data begins with these bytes, the output string will contain an invisible character at position 0. Remove the first three bytes manually if unwanted.