About

Base64 encoding represents binary or text data as an ASCII string using a 64-character alphabet (A - Z, a - z, 0 - 9, +, /). XML payloads transmitted through APIs, SOAP envelopes, or embedded in JSON frequently arrive Base64-encoded. Decoding errors - particularly with multi-byte UTF-8 sequences such as &, <, or non-Latin characters - corrupt document structure and break downstream parsers. This tool decodes Base64 to raw XML with proper UTF-8 reconstruction, validates well-formedness via the browser’s native DOMParser, and applies structural pretty-printing. It also encodes XML back to Base64 for storage or transport.

The converter handles padding normalization (missing = characters), detects binary-vs-text content, and reports exact parser error locations. Limitation: this tool validates XML well-formedness, not schema conformance (XSD/DTD validation requires a server-side process). For SOAP debugging, ensure the Base64 payload does not include a BOM prefix (0xEF 0xBB 0xBF) unless your parser expects it.

Formulas

Base64 encoding maps every group of 3 input bytes (24 bits) to 4 printable ASCII characters (6 bits each).

n_out = 4 ⋅ ceil(n_in3)

Where n_in = number of input bytes and n_out = number of Base64 output characters (including padding). Each sextet index i maps to the character T[i] in the alphabet table.

T = [A...Z, a...z, 0...9, +, /]

Decoding reverses this: each Base64 character is resolved to its 6-bit value, concatenated into a bitstream, and split into 8-bit bytes. For UTF-8 XML content, multi-byte sequences (characters above U+007F) require reassembly: a 2-byte sequence uses bits 110xxxxx 10xxxxxx, a 3-byte uses 1110xxxx 10xxxxxx 10xxxxxx, and a 4-byte uses 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx. The browser’s native atob returns a Latin-1 string; proper UTF-8 reconstruction requires passing through Uint8Array and TextDecoder.

bytes = Uint8Array(atob(b64).charCodeAt(i)) → TextDecoder(utf-8).decode(bytes)

XML well-formedness is validated by parsing with DOMParser and checking for the presence of a parsererror element in the resulting document tree.

Reference Data

Base64 Character	Decimal Value	Binary	Notes
A	0	000000	Index start
Z	25	011001	End uppercase
a	26	011010	Start lowercase
z	51	110011	End lowercase
0	52	110100	Start digits
9	61	111101	End digits
+	62	111110	Standard alphabet
/	63	111111	Standard alphabet
=	-	-	Padding character
-	62	111110	URL-safe variant replaces +
_	63	111111	URL-safe variant replaces /
Encoding Ratios
Input size	3 bytes → 4 Base64 characters
Output overhead	~33% size increase
Padding rule	Input mod 3 = 1 → ==; mod 3 = 2 → =
Common XML Encoded Entities
<	Less-than: <
>	Greater-than: >
&	Ampersand: &
'	Apostrophe: '
"	Quotation mark: "
XML Declaration Defaults
Version	1.0 (required)
Encoding	UTF-8 (default if omitted)
Standalone	yes or no

Frequently Asked Questions

The native atob function returns a Latin-1 (ISO 8859-1) string. If the original XML contains multi-byte UTF-8 characters (accented letters, CJK, emoji), each byte is misinterpreted as a separate Latin-1 character. This tool reconstructs proper UTF-8 by mapping each character code to a Uint8Array and decoding with TextDecoder('utf-8'). If you still see garbled output, the source may have been double-encoded (Base64 applied twice) - try decoding the result a second time.

RFC 4648 requires padding with = to make the string length a multiple of 4. Many APIs strip padding. This converter normalizes input by appending the required number of = characters (0, 1, or 2) based on length mod 4 before decoding. It also strips whitespace and newlines that appear in PEM-formatted or MIME-wrapped strings.

No. Browser-based DOMParser checks well-formedness only: proper nesting, closed tags, valid character references. Schema validation (XSD, DTD, Schematron) requires a validating parser with schema awareness, which is not available in client-side JavaScript. The tool will confirm your XML is syntactically correct but cannot verify it conforms to a specific schema definition.

The practical limit depends on browser memory. JavaScript strings in modern browsers can hold up to ~512 MB, but atob performance degrades above ~10 MB of Base64 input (~7.5 MB decoded). This tool enforces a 10 MB input limit to prevent tab crashes. For larger payloads, use a streaming decoder (command-line base64 utility).

The pretty-printer operates on the raw text output, not on a parsed DOM tree, so it preserves CDATA sections (<![CDATA[...]]>), processing instructions (<?...?>), and comments () verbatim. Indentation is applied based on tag depth counting. Content inside CDATA blocks is not re-indented.

URL-safe Base64 (RFC 4648 §5) replaces + with - and / with _, and often omits padding. This converter auto-detects URL-safe encoding by checking for - or _ characters, performs the character substitution, restores padding, then decodes normally. No manual conversion is needed.