About

CSV files encode tabular data as delimiter-separated values per RFC 4180. Mishandling quoted fields, embedded newlines, or escaped double-quotes ("") during conversion silently corrupts output. A naive split on commas destroys any field containing , within quotes. This tool implements a finite-state-machine parser that correctly resolves all edge cases defined in the RFC specification. It supports four delimiter types: comma (,), semicolon (;), tab (\t), and pipe (|).

The converter transforms each CSV row into a multiline block where every field occupies its own line. Optional header labeling prepends column names to each value. This is useful for converting dense spreadsheet exports into human-readable reports, log entries, or configuration files. Note: this tool operates entirely client-side. No data leaves your browser. Maximum practical file size depends on available browser memory, typically 50MB for modern browsers.

Formulas

The CSV parser operates as a finite state machine with three states:

{

S₀ = FIELD_START - expecting new fieldS₁ = UNQUOTED - reading unquoted field charactersS₂ = QUOTED - inside double-quoted field

State transitions on character c at position i:

S₀ + c = " → S₂

S₀ + c = delim → emit empty field → S₀

S₂ + c = "" → emit literal quote → S₂

S₂ + c = " → close quoted field → S₀

Where delim ∈ {, ; \t |}. The parser scans the input string in O(n) time with a single pass, where n is the character count of the input.

Reference Data

Delimiter	Symbol	Common Sources	RFC Standard	Risk if Mishandled
Comma	,	Excel, Google Sheets, most databases	RFC 4180	Breaks on currency values like 1,000.00
Semicolon	;	European Excel exports (locale-dependent)	No formal RFC	Misidentified as comma-delimited
Tab	\t	TSV files, database dumps, clipboard paste	IANA text/tab-separated-values	Invisible character causes silent parse errors
Pipe	\|	Legacy mainframe exports, HL7 medical data	No formal RFC	Rare in field data, low collision risk
Double-quote escape	""	Any quoted CSV field containing quotes	RFC 4180 §2.7	Unescaped quotes break field boundaries
Quoted newline	\n inside "..."	Address fields, descriptions, notes	RFC 4180 §2.6	Row count mismatch if split on newline
BOM marker	\uFEFF	UTF-8 files from Windows Notepad	Unicode §2.13	Corrupts first field name
CRLF line ending	\r\n	Windows-generated files	RFC 4180 §2.1	Extra blank lines in Unix environments
Empty fields	,,	Sparse datasets, optional columns	RFC 4180 §2.4	Off-by-one column alignment errors
Trailing delimiter	a,b,c,	Some database exports	Not standard	Creates phantom empty last column
Mixed quoting	a,"b",c	Hand-edited CSV files	RFC 4180 §2.5	Inconsistent but valid per spec
UTF-8 encoding	Multi-byte chars	International datasets	RFC 3629	Mojibake if decoded as Latin-1

Frequently Asked Questions

Per RFC 4180, any field containing the delimiter, double-quotes, or newlines must be enclosed in double-quotes. The parser enters the QUOTED state upon encountering an opening quote and does not treat delimiter characters as field separators until the closing quote is reached. If your CSV has unquoted fields with embedded delimiters, the parse will produce incorrect column counts - this indicates a malformed CSV source.

RFC 4180 §2.7 specifies that a double-quote inside a quoted field is escaped by preceding it with another double-quote. So the value She said "Hello" is encoded as "She said ""Hello""". The parser detects consecutive double-quotes inside the QUOTED state and emits a single literal quote character. Single escaped quotes (backslash-quote) are NOT standard CSV and will be treated as literal characters.

Yes. The auto-detect algorithm counts occurrences of each candidate delimiter (comma, semicolon, tab, pipe) in the first 5 lines outside of quoted regions. The delimiter with the most consistent count across lines wins. This heuristic works for well-formed files but may fail on single-column CSVs or files where multiple candidate delimiters appear equally. In ambiguous cases, select the delimiter manually.

The tool operates on the browser's native UTF-16 string representation. When you paste text or load a UTF-8 file, the FileReader API decodes it correctly. BOM (Byte Order Mark) characters at position 0 are automatically stripped. However, files encoded in legacy formats like Windows-1252 or Shift-JIS may produce garbled output - convert them to UTF-8 before loading.

Processing happens entirely in browser memory. Modern browsers allocate roughly 1-4 GB to a tab. A practical limit is approximately 50 MB of CSV text, which corresponds to roughly 500,000 rows of typical tabular data. Beyond this, the browser tab may become unresponsive. For very large files, consider splitting them with a command-line tool like "split" or "csvsql" before loading.

Consecutive delimiters (e.g., a,,c) produce an empty string for the middle field. This is correct per RFC 4180 §2.4. Trailing delimiters at the end of a row (e.g., a,b,c,) create an additional empty field at the end. The converter faithfully represents these as blank lines in the multiline output. If header labeling is enabled, empty fields show the header name followed by nothing.