Bytes to String Converter
Convert byte arrays (decimal, hex, binary, octal) to readable text strings with full UTF-8 decoding. Supports bidirectional conversion.
About
Raw byte sequences carry no meaning until decoded against a character encoding standard. A single misidentified byte in a UTF-8 stream produces the replacement character U+FFFD and corrupts every subsequent multi-byte codepoint in the sequence. This tool decodes byte arrays - expressed as decimal (0 - 255), hexadecimal (00 - FF), binary (00000000 - 11111111), or octal (0 - 377) - into their corresponding Unicode string using proper UTF-8 multi-byte reassembly. It also reverses the operation: any plaintext string converts back to its constituent byte values in your chosen base.
The converter handles all four UTF-8 byte lengths: single-byte ASCII (U+0000 - U+007F), two-byte sequences up to U+07FF, three-byte sequences covering the Basic Multilingual Plane up to U+FFFF, and four-byte sequences for supplementary planes up to U+10FFFF. Note: this tool assumes valid UTF-8 encoding. Malformed continuation bytes or overlong encodings are flagged as errors rather than silently accepted. Pro tip: when debugging network protocols or binary file headers, paste raw hex dumps directly - the delimiter auto-detection handles spaces, commas, 0x prefixes, and contiguous streams.
Formulas
UTF-8 encodes Unicode codepoints into variable-length byte sequences. The encoding rule maps a codepoint U to 1 - 4 bytes based on its magnitude:
To decode bytes back to a codepoint, extract the payload bits from each byte and concatenate. For a 2-byte sequence with lead byte b0 and continuation byte b1:
For base conversion of individual byte values, a byte B in decimal relates to other bases as:
Where U = Unicode codepoint value, bn = the n-th byte in the UTF-8 sequence, B = a single byte value in range 0 - 255, and โง = bitwise AND, << = left bit-shift.
Reference Data
| Byte Range (Dec) | Hex Range | Binary Pattern | UTF-8 Bytes | Unicode Range | Description |
|---|---|---|---|---|---|
| 0 - 127 | 00 - 7F | 0xxxxxxx | 1 | U+0000 - U+007F | ASCII (Latin letters, digits, punctuation) |
| 194 - 223 | C2 - DF | 110xxxxx | 2 | U+0080 - U+07FF | Latin Extended, Greek, Cyrillic, Arabic, Hebrew |
| 224 - 239 | E0 - EF | 1110xxxx | 3 | U+0800 - U+FFFF | CJK, Devanagari, Thai, BMP symbols |
| 240 - 247 | F0 - F7 | 11110xxx | 4 | U+10000 - U+10FFFF | Emoji, historic scripts, musical symbols |
| 128 - 191 | 80 - BF | 10xxxxxx | - | - | Continuation byte (never starts a character) |
| 192 - 193 | C0 - C1 | 110000xx | - | - | Overlong encoding (invalid in UTF-8) |
| 248 - 255 | F8 - FF | 11111xxx | - | - | Invalid UTF-8 lead bytes (RFC 3629) |
| 32 | 20 | 00100000 | 1 | U+0020 | Space character |
| 10 | 0A | 00001010 | 1 | U+000A | Line Feed (LF / \n) |
| 13 | 0D | 00001101 | 1 | U+000D | Carriage Return (CR / \r) |
| 9 | 09 | 00001001 | 1 | U+0009 | Horizontal Tab (\t) |
| 0 | 00 | 00000000 | 1 | U+0000 | Null character (NUL) |
| 239 187 191 | EF BB BF | - | 3 | U+FEFF | UTF-8 BOM (Byte Order Mark) |
| 239 191 189 | EF BF BD | - | 3 | U+FFFD | Replacement Character (decoding error) |
| 48 - 57 | 30 - 39 | 0011xxxx | 1 | U+0030 - U+0039 | Digits 0-9 |
| 65 - 90 | 41 - 5A | 01xxxxxx | 1 | U+0041 - U+005A | Uppercase A - Z |
| 97 - 122 | 61 - 7A | 01xxxxxx | 1 | U+0061 - U+007A | Lowercase a - z |
| 240 159 152 128 | F0 9F 98 80 | - | 4 | U+1F600 | Grinning Face emoji ๐ |