About

Delimiter-separated value files appear simple until a quoted field contains your target delimiter, a literal newline, or an escaped double-quote. A naive split on d (where d is your delimiter character) will corrupt structured data. This tool implements a full RFC 4180 state-machine parser that correctly handles every edge case - nested quotes (""), multiline fields, and BOM markers - before re-serializing to your chosen output separator. Auto-detection analyzes character frequency across the first 10 rows to infer the source delimiter with high confidence. All processing runs client-side; no data leaves your browser.

The term 0SV refers to "zero-assumption separated values" - the output delimiter is not fixed to commas or tabs but is any single character or string you specify. This matters when piping data between systems that disagree on format: a PostgreSQL COPY command expects tab-separated input, Excel region settings may default to semicolons, and Unix tools like awk default to whitespace. Mismatched delimiters cause silent column shifts that propagate errors downstream. This converter eliminates that risk.

Formulas

The parser operates as a finite state machine with three states. Given input string S, source delimiter d_in, and quote character q (default "), each character c_i triggers a state transition:

{

FIELD_START → IN_QUOTED if c_i = qFIELD_START → IN_UNQUOTED if c_i ≠ qIN_QUOTED → FIELD_START if c_i = q ∧ c_i+1 ≠ qIN_UNQUOTED → FIELD_START if c_i = d_in ∨ c_i ∈ {\n, \r\n}

Escaped quotes are detected when c_i = q ∧ c_i+1 = q inside IN_QUOTED state - the pair is collapsed to a single q in output.

For re-serialization to output delimiter d_out, each field f is wrapped in quotes if and only if:

needsQuote(f) = f contains d_out ∨ f contains q ∨ f contains \n

Where d_in = source delimiter, d_out = target delimiter, q = quote character, c_i = character at position i, and f = a single parsed field value.

Reference Data

Format Name	Delimiter	Common Extension	Typical Use Case	RFC / Standard	Quoting Rule
CSV (Comma)	,	.csv	Spreadsheets, databases, general interchange	RFC 4180	Double-quote fields containing delimiter or newline
TSV (Tab)	\t	.tsv	Bioinformatics, PostgreSQL COPY, Unix tools	IANA text/tab-separated-values	Rarely quoted; tabs in data are escaped
SSV (Semicolon)	;	.csv	European Excel exports (comma is decimal separator)	De facto (ISO locale-dependent)	Same as CSV but with semicolon
PSV (Pipe)	\|	.psv / .txt	EDI, HL7 health data, legacy mainframes	HL7 v2.x, X12 EDI	Rarely quoted; pipe uncommon in text
Colon-SV	:	.txt	/etc/passwd, Unix config files	POSIX convention	No quoting; fields must not contain colons
Space-SV	␣	.txt	Fixed-width simulation, simple logs	None	Problematic - spaces in data cause misalignment
Tilde-SV	~	.txt	Legacy banking, NACHA ACH files	NACHA specification	No standard quoting
Caret-SV	^	.txt	Custom ETL pipelines where other delimiters conflict	None	Application-specific
NULL-SV	\0	Binary	xargs -0, find -print0 (filenames with spaces)	POSIX	No quoting needed - NULL never appears in text
SOH-SV	\x01	Binary	Hive default, internal database interchange	Apache Hive convention	No quoting needed - SOH is non-printable
RS/GS-SV	\x1E / \x1D	Binary	ASCII record/group separation (ISO 646)	ISO 646, ASCII control chars	Purpose-built; no conflicts
Multi-char	:: or \|-\|	.txt	Custom formats where single-char delimiters conflict	None	Application-specific escaping

Frequently Asked Questions

The parser reads the first 10 lines of the file and counts occurrences of candidate delimiters (comma, semicolon, tab, pipe). It then checks which candidate produces a consistent column count across all sampled rows. The candidate with the highest consistency score and frequency wins. If multiple candidates tie, priority follows the order: tab, comma, semicolon, pipe - matching the most common real-world conventions.

The RFC 4180 state machine correctly handles embedded newlines. When the parser enters the IN_QUOTED state upon encountering an opening double-quote, newline characters (\n or \r\n) are treated as literal field content rather than row terminators. The field only ends when a closing quote is followed by a delimiter or end-of-line. This means a field like "Line 1\nLine 2" is preserved as a single cell value in the output.

Yes. The converter supports arbitrary-length delimiter strings such as :: or |-| or even words like [SEP]. The quoting logic adapts: if any field contains the full multi-character delimiter as a substring, that field will be wrapped in quotes. Be aware that most standard tools (Excel, pandas read_csv) expect single-character delimiters, so multi-character separators are best suited for custom ETL pipelines.

Per RFC 4180, a literal double-quote inside a quoted field is represented as two consecutive double-quotes (""). The parser detects this pair in the IN_QUOTED state and collapses it to a single quote character in the parsed output. During re-serialization, if the field requires quoting (because it contains the output delimiter, a newline, or a quote), any internal quotes are re-escaped as double-quotes.

Yes. The parser normalizes line endings before processing. It recognizes \r\n (Windows/CRLF), \n (Unix/LF), and \r (legacy Mac/CR). All are treated as equivalent row terminators outside of quoted fields. The output uses \n (Unix LF) by default, which is universally accepted by modern systems.

Files under 500 KB are processed on the main thread for instant feedback. Files between 500 KB and approximately 50 MB are offloaded to a Web Worker to prevent UI freezing. The practical upper limit depends on your browser's available memory - typically 100-200 MB for modern browsers. For files exceeding this, consider splitting them with a command-line tool like the Unix split command first.

Inconsistent column counts almost always indicate a parsing error in the source file - typically an unescaped quote or delimiter inside a field that was not properly quoted. The converter reports row-level column count mismatches in the statistics panel. Check the flagged rows in your source data and ensure fields containing special characters are wrapped in double-quotes per RFC 4180.