About

Base62 encoding maps arbitrary binary data onto a 62-character alphabet: 0-9, A - Z, a - z. Unlike Base64, it avoids +, /, and = padding, producing URL-safe, filename-safe output without percent-encoding overhead. This matters in systems where non-alphanumeric characters cause parsing failures: REST path segments, short-link services, database keys with collation constraints, and distributed trace IDs. An incorrect encoding choice can silently corrupt data when it passes through middleware that strips or escapes special characters.

This tool converts UTF-8 strings to their Base62 representation by treating the byte sequence as a big-endian unsigned integer and performing repeated modular division by 62. The result is deterministic and reversible. Limitation: because the algorithm operates on arbitrary-precision integers, inputs beyond approximately 100 KB will incur noticeable latency. For bulk binary payloads, a chunked approach or Base64 with URL-safe variant may be more practical.

Formulas

Base62 encoding treats the input byte array as a single big-endian unsigned integer N and converts it to base 62 using the alphabet A = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".

N = n−1∑i=0 b_i ⋅ 256ⁿ⁻¹⁻ⁱ

where b_i is the i-th byte of the UTF-8 encoded input and n is the total byte count. The encoded digits d_k are extracted by repeated division:

d_k = N mod 62 , N ← N62

The output string is the sequence of A[d_k] characters, reversed (most-significant digit first). Decoding reverses the process: each character maps to index d_k, accumulated as N = N ⋅ 62 + d_k, then N is converted back to bytes.

The expansion ratio is log(256)log(62) ≈ 1.344, meaning each input byte produces roughly 1.344 Base62 characters.

Where: N = integer representation of input bytes, b_i = i-th byte value (0 - 255), A = Base62 alphabet string, d_k = k-th digit in base 62.

Reference Data

Encoding	Alphabet Size	Characters Used	URL Safe	Padding	Expansion Ratio (approx.)	Common Use Cases
Base62	62	0-9, A - Z, a - z	Yes	None	1.37×	Short URLs, trace IDs, tokens
Base64	64	A - Z, a - z, 0-9, +/	No	=	1.33×	Email (MIME), data URIs
Base64url	64	A - Z, a - z, 0-9, -_	Yes	Optional	1.33×	JWT, URL parameters
Base58	58	Base62 minus 0, O, I, l	Yes	None	1.38×	Bitcoin addresses, IPFS
Base32	32	A - Z, 2-7	Yes	=	1.60×	TOTP secrets, Crockford IDs
Base16 (Hex)	16	0-9, A - F	Yes	None	2.00×	Hash digests, MAC addresses
Base85 (Ascii85)	85	ASCII 33 - 117	No	None	1.25×	PDF streams, PostScript
Base36	36	0-9, A - Z	Yes	None	1.55×	Case-insensitive short IDs
Base91	91	ASCII printable subset	No	None	1.23×	Compact binary-to-text
Z85 (ZeroMQ)	85	Printable ASCII subset	Partial	None	1.25×	ZeroMQ frames, CurveZMQ
Base128	128	Full 7-bit ASCII	No	None	1.14×	Protobuf varints
UUencode	64	ASCII 32 - 95	No	Length byte	1.37×	Legacy Unix email

Frequently Asked Questions

Base64 uses + and / characters, which have reserved meanings in URLs (space and path separator respectively). This requires percent-encoding (%2B, %2F), expanding the string by up to 3× per special character. Base62 uses only 0-9, A - Z, a - z - all unreserved URI characters per RFC 3986 - so no escaping is ever needed. Base64url exists as a compromise but still optionally uses = padding.

The tool caps input at 100 KB (102,400 bytes). The encoding algorithm converts the full byte array into a single BigInt, meaning computational complexity grows quadratically with input size due to BigInt division. For inputs under 10 KB, encoding completes in under 100 ms. Beyond 50 KB, expect multi-second latency. For large payloads, consider chunked encoding or Base64.

Yes, and this is one of the most common applications. Base62 IDs are shorter than UUID hex representations (22 Base62 characters vs 32 hex characters for 128 bits) and safe for case-sensitive collations. However, verify your database collation is case-sensitive (utf8_bin in MySQL, C collation in PostgreSQL). A case-insensitive collation will treat aB and Ab as identical, causing key collisions.

When the input byte array starts with zero-bytes (e.g., \x00\x00Hello), the BigInt representation loses that leading-zero information since 0 ⋅ 256ⁿ = 0. This tool prefixes the encoded output with a 4-byte length header (the original byte count encoded as a 4-byte big-endian integer), which is itself included in the BigInt before Base62 conversion. The decoder reads this length to restore the exact original byte sequence, including any leading zeros.

No. toString(36) produces Base36 (digits + lowercase letters only, 36 characters). Base62 adds uppercase letters, giving 62 distinct symbols. This reduces output length by roughly 13% compared to Base36 for the same input. JavaScript's native radix functions also only handle numbers up to Number.MAX_SAFE_INTEGER (2⁵³ − 1), whereas this tool uses BigInt for arbitrary precision.

The decoder validates every character against the 62-character alphabet before processing. Any character not in 0-9, A - Z, a - z triggers an immediate error with identification of the invalid character and its position. This prevents silent data corruption that would occur if invalid characters were simply skipped or mapped to zero.