About

Base65536 is a binary-to-text encoding that stores 2 bytes per Unicode code point, achieving greater density than Base64 (which stores only 6 bits per character). It was designed by Sam Hughes for safe transit through UTF-16-capable channels such as Twitter. The encoding selects 256 Unicode block start points, each block containing 256 consecutive safe code points, forming a lookup table that maps each pair of input bytes to a single character. A separate set of 256 “padding” blocks handles the final odd byte when input length is not even. Incorrect decoding produces garbled binary output or silent data corruption with no checksum to alert you. This tool implements the full repertoire mapping client-side with zero server dependency. It approximates the reference specification assuming well-formed input; malformed code points outside the known repertoire will produce an explicit error rather than silent failure.

Formulas

Each Base65536 character c with code point cp is decoded by locating which block it belongs to:

blockStart = cp − (cp mod 256)

The offset within the block gives one byte value:

offset = cp mod 256

For a 2-byte block at repertoire index b:

byte₁ = b , byte₂ = offset

For a 1-byte (padding) block at repertoire index b:

byte₁ = b

The resulting byte array is then decoded as UTF-8 via TextDecoder to produce the final plaintext string.

Where cp = Unicode code point of the encoded character, b = repertoire index (0 - 255) mapping block start to byte value, offset = position within the 256-character block.

Reference Data

Block Start (Hex)	Block Start (Decimal)	Type	Bytes Encoded	Example Char
U+03400	13312	2-byte	2	㐀
U+03500	13568	2-byte	2	㔀
U+03600	13824	2-byte	2	㘀
U+03700	14080	2-byte	2	㜀
U+03800	14336	2-byte	2	㠀
U+03900	14592	2-byte	2	㤀
U+03A00	14848	2-byte	2	㨀
U+03B00	15104	2-byte	2	㬀
U+03C00	15360	2-byte	2	㰀
U+03D00	15616	2-byte	2	㴀
U+03E00	15872	2-byte	2	㸀
U+03F00	16128	2-byte	2	㼀
U+04000	16384	2-byte	2	䀀
U+04100	16640	2-byte	2	䄀
U+04200	16896	2-byte	2	䈀
U+04300	17152	2-byte	2	䌀
U+04400	17408	2-byte	2	䐀
U+04500	17664	2-byte	2	䔀
U+04600	17920	2-byte	2	䘀
U+04700	18176	2-byte	2	䜀
U+1B400	111616	1-byte (pad)	1	Padding char
Full repertoire: 256 two-byte blocks + 256 one-byte padding blocks mapped from the Base65536 specification.

Frequently Asked Questions

Unicode code points span up to U+10FFFF, but safe, printable, cross-platform-stable blocks are limited. Base65536 uses 256 blocks of 256 code points each, giving exactly 65536 safe slots - enough to encode 2 bytes (values 0-255 for each byte) per character. Going higher would require blocks in unstable or surrogate ranges.

The last byte is encoded using a separate set of 256 padding blocks (1-byte blocks). During decoding, encountering a padding-block character signals that only 1 byte should be emitted for that character. A padding character appearing anywhere except the final position indicates malformed input.

Base64 encodes 6 bits per ASCII character. In UTF-16 storage (2 bytes per char), that yields 6 bits per 16 bits = 37.5% efficiency. Base65536 encodes 16 bits per character (also stored as roughly 16 bits in UTF-16), achieving near 100% efficiency. On Twitter (which counts by Unicode scalar values), Base65536 packs roughly 280 × 2 = 560 bytes into a single tweet vs. ~210 bytes with Base64.

The converter decodes the byte stream and interprets it as UTF-8 by default. If the original data was encoded in a different character set (e.g., ISO-8859-1), the output may contain replacement characters (U+FFFD). For raw binary inspection, the tool shows the byte count and hex representation is available via browser DevTools on the decoded Uint8Array.

This error occurs when a code point in the input does not belong to any of the 512 registered blocks (256 two-byte + 256 one-byte). Common causes: copy-paste corruption, invisible formatting characters inserted by rich-text editors, or input that is actually Base64 or another encoding mistakenly treated as Base65536. Whitespace characters (spaces, newlines, tabs) are silently ignored during decoding.

The converter processes input in the main thread. JavaScript string length is capped at approximately 2^28 characters in modern browsers. Practically, inputs up to several megabytes of Base65536 text decode within milliseconds. For inputs exceeding 1 MB of encoded text, expect a brief processing delay shown via the loading indicator.