About

Character encoding errors corrupt data silently. A single misinterpreted byte in a Base64 payload breaks an API response. A wrong hex sequence in a firmware flash bricks hardware. This tool converts between any standard numeral base (2, 8, 10, 16, 32, 36, 58, 64, 85) and valid UTF-8 text. It handles the full Unicode range including multi-byte sequences for CJK characters and emoji, where a single code point like U+1F600 requires 4 bytes in UTF-8. The converter operates entirely client-side with no server round-trips.

Base58 uses the Bitcoin alphabet (no 0, O, I, l) to avoid visual ambiguity. Base85 follows the Ascii85 (btoa) variant used in PostScript and PDF. Base32 implements RFC 4648 with optional padding. Limitations: this tool assumes well-formed input. Malformed sequences (e.g., a lone surrogate in UTF-16 re-encoded as UTF-8) will produce a decoding error rather than garbage output.

Formulas

For numeric bases (Base b where b ∈ {2, 8, 10, 16, 36}), each UTF-8 byte is independently converted. Encoding a byte array to base b:

s_i = repr(byte_i, b) for each i ∈ [0, n)

where repr(v, b) converts integer v to its string representation in base b, zero-padded to a fixed width (e.g., 8 digits for binary, 2 for hex).

Base64 groups 3 input bytes into 4 output characters using 6-bit indices:

n = (b₀ × 65536) + (b₁ × 256) + b₂
c_k = alphabet[(n >> (18 − 6k)) ∧ 63] for k = 0..3

Base58 treats the entire byte array as a single big-endian unsigned integer and performs repeated division:

while N > 0:
r = N mod 58
N = ⌊N ÷ 58⌋
prepend alphabet[r]

Leading zero bytes map to the character 1 (first in the Base58 alphabet), preserving byte-length information.

Base85 (Ascii85) groups 4 bytes into a 32-bit integer and extracts 5 digits in base 85:

n = b₀ × 256³ + b₁ × 256² + b₂ × 256 + b₃
c_k = chr((⌊n ÷ 85^(4−k)⌋ mod 85) + 33)

Where b_i = individual bytes, c_k = output characters, N = big integer value of the full byte array, alphabet = the character set for the given base, chr = ASCII character from code point.

Reference Data

Base	Alphabet / Charset	Bits per Char	Use Case	Padding	Example ("A")
Base2 (Binary)	0 - 1	1	Low-level protocols, bitfields	None	01000001
Base8 (Octal)	0 - 7	3	Unix file permissions, legacy systems	None	101
Base10 (Decimal)	0 - 9	3.32	Human-readable byte values	None	65
Base16 (Hex)	0 - 9, A - F	4	Memory dumps, color codes, hashes	None	41
Base32 (RFC 4648)	A - Z, 2 - 7	5	TOTP tokens, DNS, case-insensitive IDs	=	IE======
Base36	0 - 9, A - Z	5.17	Short URLs, compact numeric IDs	None	1T
Base58 (Bitcoin)	1 - 9, A - HJ - NP - Z, a - km - z	5.86	Bitcoin addresses, IPFS CIDs	None	28
Base64 (RFC 4648)	A - Z, a - z, 0 - 9, +, /	6	Email (MIME), data URIs, JWT	=	QQ==
Base85 (Ascii85)	! - u (ASCII 33 - 117)	6.41	PostScript, PDF streams	None	5l
Efficiency comparison for encoding 1024 random bytes
Base2	Output size		8192 chars (800% overhead)
Base16	Output size		2048 chars (200% overhead)
Base64	Output size		1368 chars (133% overhead)
Base85	Output size		1280 chars (125% overhead)

Frequently Asked Questions

UTF-8 encodes code points above U+007F as multi-byte sequences (2-4 bytes). The converter uses the TextEncoder API to produce the correct byte sequence, then encodes each byte in the selected base. For example, the emoji 😀 (U+1F600) becomes 4 bytes: F0 9F 98 80 in hex. When decoding, all bytes are reassembled and passed through TextDecoder, which reconstructs the original code point. If the byte sequence is malformed (e.g., truncated mid-character), the decoder reports an error rather than producing replacement characters.

Standard Base64 (RFC 4648 §4) uses + and / as the 62nd and 63rd characters, with = for padding. Base64URL (RFC 4648 §5) substitutes - for + and _ for /, and typically omits padding. This tool uses standard Base64. If you need Base64URL output, replace + with -, / with _, and strip trailing = characters from the result.

Hex maps each byte independently to exactly 2 characters because 256 = 16². Base 58 is not a power of 2, so a byte cannot map cleanly to a fixed number of Base58 digits. Instead, the entire input is treated as one large big-endian integer and divided repeatedly by 58. This produces a variable-length output and is computationally more expensive, but it avoids visually ambiguous characters (0, O, I, l) which is critical for cryptocurrency addresses where a single wrong character means lost funds.

The converter validates input against the exact alphabet of the selected base before processing. For hex, only 0-9 and A - F (case-insensitive) are accepted. For Base64, only A - Z, a - z, 0-9, +, /, and = are valid. Invalid characters trigger an error toast specifying which character is illegal and its position. Whitespace (spaces, newlines, tabs) is automatically stripped before validation as a convenience.

In Base58, each leading 0x00 byte in the input maps to the character "1" (the first character of the Base58 alphabet). These are counted and prepended to the result of the big-integer division. Without this step, leading zero bytes would be lost since they do not affect the numeric value. This is essential for Bitcoin address checksums where leading zeros carry semantic meaning.

Yes. RFC 4648 §6 specifies that Base32 decoders MUST accept both uppercase and lowercase input. This converter normalizes all Base32 input to uppercase before decoding. Padding characters (=) are optional during decoding - the converter infers missing padding from the input length. However, if padding is present, it must be correct (the input length including padding must be a multiple of 8).

The tool accepts inputs up to 100 KB of text. For Base2 (binary) encoding, 100 KB of UTF-8 text produces approximately 800 KB of output (8× expansion). All processing runs synchronously in the main thread, which handles 100 KB inputs in under 50 ms on modern hardware. For substantially larger payloads, a streaming approach with Web Workers would be necessary, which exceeds the scope of a client-side single-page tool.