About

Base64 encoding transforms arbitrary byte sequences into a restricted ASCII character set comprising 64 symbols (A - Z, a - z, 0 - 9, +, /) plus = padding. CSV files frequently contain commas, newlines, and non-ASCII characters (diacritics, currency symbols like € or ¥) that corrupt during transport through systems expecting plain ASCII. Embedding raw CSV in JSON payloads, URL parameters, or email headers without encoding causes parsing failures. A single unescaped newline character breaks an API request body. This tool performs real UTF-8 - safe Base64 conversion, handling multi-byte characters correctly where naive btoa calls throw exceptions.

The encoding inflates data size by a factor of approximately 43 (≈ 33% overhead). For a 1MB CSV file, expect roughly 1.37MB of Base64 output. This tool handles files up to 10MB and supports optional data:text/csv;base64, MIME prefix for direct use in Data URIs. Limitation: files containing binary data embedded within CSV cells may encode correctly but produce unexpected results on decode if the original encoding was not UTF-8.

Formulas

Base64 encoding converts each group of 3 input bytes (24 bits) into 4 output characters by splitting into 6-bit segments and mapping each to the Base64 alphabet.

n_out = 4 ⋅ ceil(n_in3)

Where n_in = number of input bytes and n_out = number of output Base64 characters (including padding).

For UTF-8 safe encoding, each character c is first converted to its UTF-8 byte sequence B using TextEncoder. The byte array is then converted to a binary string and passed to btoa:

encode(s) = btoa(String.fromCharCode(…TextEncoder.encode(s)))

Decoding reverses the process:

decode(b) = TextDecoder.decode(Uint8Array(atob(b).charCodeAt))

The size ratio is constant: n_outn_in = 43 ≈ 1.333. Each 6-bit index i maps to character T[i] in the alphabet table T = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/.

Reference Data

Property	Value / Detail
Base64 Alphabet	A - Z, a - z, 0 - 9, +, /
Padding Character	= (appended to make output length divisible by 4)
Input Encoding	UTF-8 (multi-byte safe)
Size Overhead	≈ 33.3% increase (43 ratio)
Max Input Size (this tool)	10MB
RFC Standard	RFC 4648 (Base Encodings)
MIME Data URI Prefix	`data:text/csv;base64,`
Line Length (MIME)	76 characters per line (RFC 2045)
URL-Safe Variant	Replaces + with −, / with _
Bits per Base64 Char	6 bits
Input Group Size	3 bytes (24 bits) → 4 Base64 characters
Common CSV Delimiters	Comma (,), Semicolon (;), Tab (\t), Pipe (\|)
CSV Line Endings	CRLF (\r\n) or LF (\n)
CSV Standard	RFC 4180
Empty String Encoded	(empty output)
Single Char "A" Encoded	QQ==
"Hello" Encoded	SGVsbG8=
Padding Cases	0 bytes remainder → no pad; 1 byte → ==; 2 bytes → =
Use Case: API Transport	Embed CSV in JSON without escaping issues
Use Case: Email Attachment	MIME-encoded CSV in email body
Use Case: Data URI	Inline CSV download link in HTML
Use Case: Database Storage	Store CSV blob as text column

Frequently Asked Questions

The native btoa function only accepts characters in the Latin-1 range (code points 0 - 255). CSV files containing UTF-8 multi-byte characters (e.g., accented letters like é, CJK characters, or symbols like €) will cause a DOMException. This tool solves the problem by first encoding the string to a UTF-8 byte array via TextEncoder, converting each byte to its Latin-1 character equivalent, then applying btoa to the resulting binary string.

Enabling the MIME prefix prepends data:text/csv;base64, to the Base64 string. This creates a valid Data URI (RFC 2397) that browsers can interpret as a downloadable CSV file. You can paste it directly into an HTML anchor tag's href attribute. Without the prefix, the output is a raw Base64 string suitable for JSON payloads, API bodies, or database storage. Note that Data URIs have browser-specific size limits (typically 2MB in older browsers).

CSV files use different line ending conventions: Windows uses CRLF (\r\n, 2 bytes), while Unix/macOS uses LF (\n, 1 byte). This tool preserves original line endings by default. If you normalize line endings before encoding, the decoded output may differ from the original file. The line ending style affects the byte count and therefore the Base64 output. A 100-line file has 99 extra bytes in CRLF vs LF mode.

Yes. Base64 encoding is content-agnostic. It operates on raw bytes, not CSV structure. Embedded quotes, commas, newlines within quoted fields, and any special characters are encoded as part of the byte stream. The CSV parsing rules (RFC 4180) are irrelevant during encoding. The original structure is perfectly preserved upon decoding. This is precisely why Base64 is preferred for transport: it eliminates all delimiter collision issues.

This tool accepts files up to 10MB. The practical limit is browser memory. A 10MB CSV produces approximately 13.3MB of Base64 output. Both must fit in memory simultaneously, totaling ≈ 23.3MB of RAM. Modern browsers handle this without issue. For files exceeding 10MB, consider server-side encoding or streaming approaches.

Use the bidirectional mode. Paste your CSV, encode it, then switch to decode mode and paste the Base64 output. The decoded result must match the original input byte-for-byte. You can also verify the output length: it must be divisible by 4 and satisfy n_out = 4 ⋅ ceil(n_in ÷ 3). The output must contain only characters from the Base64 alphabet plus optional = padding at the end.