User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
Drop CSV file here or browse Supports .csv, .tsv, .txt β€” up to 200 MB
or paste text
Processing…
0Rows
0Columns
β€”Detected
β€”Output
0msTime
Converted Output
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

Delimiter-separated value files appear simple until a quoted field contains your target delimiter, a literal newline, or an escaped double-quote. A naive split on d (where d is your delimiter character) will corrupt structured data. This tool implements a full RFC 4180 state-machine parser that correctly handles every edge case - nested quotes (""), multiline fields, and BOM markers - before re-serializing to your chosen output separator. Auto-detection analyzes character frequency across the first 10 rows to infer the source delimiter with high confidence. All processing runs client-side; no data leaves your browser.

The term 0SV refers to "zero-assumption separated values" - the output delimiter is not fixed to commas or tabs but is any single character or string you specify. This matters when piping data between systems that disagree on format: a PostgreSQL COPY command expects tab-separated input, Excel region settings may default to semicolons, and Unix tools like awk default to whitespace. Mismatched delimiters cause silent column shifts that propagate errors downstream. This converter eliminates that risk.

csv converter delimiter converter tsv converter csv to tsv csv to pipe separated custom separator 0sv dsv converter csv parser

Formulas

The parser operates as a finite state machine with three states. Given input string S, source delimiter din, and quote character q (default "), each character ci triggers a state transition:

{
FIELD_START β†’ IN_QUOTED if ci = qFIELD_START β†’ IN_UNQUOTED if ci β‰  qIN_QUOTED β†’ FIELD_START if ci = q ∧ ci+1 β‰  qIN_UNQUOTED β†’ FIELD_START if ci = din ∨ ci ∈ {\n, \r\n}

Escaped quotes are detected when ci = q ∧ ci+1 = q inside IN_QUOTED state - the pair is collapsed to a single q in output.

For re-serialization to output delimiter dout, each field f is wrapped in quotes if and only if:

needsQuote(f) = f contains dout ∨ f contains q ∨ f contains \n

Where din = source delimiter, dout = target delimiter, q = quote character, ci = character at position i, and f = a single parsed field value.

Reference Data

Format NameDelimiterCommon ExtensionTypical Use CaseRFC / StandardQuoting Rule
CSV (Comma),.csvSpreadsheets, databases, general interchangeRFC 4180Double-quote fields containing delimiter or newline
TSV (Tab)\t.tsvBioinformatics, PostgreSQL COPY, Unix toolsIANA text/tab-separated-valuesRarely quoted; tabs in data are escaped
SSV (Semicolon);.csvEuropean Excel exports (comma is decimal separator)De facto (ISO locale-dependent)Same as CSV but with semicolon
PSV (Pipe)|.psv / .txtEDI, HL7 health data, legacy mainframesHL7 v2.x, X12 EDIRarely quoted; pipe uncommon in text
Colon-SV:.txt/etc/passwd, Unix config filesPOSIX conventionNo quoting; fields must not contain colons
Space-SV␣.txtFixed-width simulation, simple logsNoneProblematic - spaces in data cause misalignment
Tilde-SV~.txtLegacy banking, NACHA ACH filesNACHA specificationNo standard quoting
Caret-SV^.txtCustom ETL pipelines where other delimiters conflictNoneApplication-specific
NULL-SV\0Binaryxargs -0, find -print0 (filenames with spaces)POSIXNo quoting needed - NULL never appears in text
SOH-SV\x01BinaryHive default, internal database interchangeApache Hive conventionNo quoting needed - SOH is non-printable
RS/GS-SV\x1E / \x1DBinaryASCII record/group separation (ISO 646)ISO 646, ASCII control charsPurpose-built; no conflicts
Multi-char:: or |-|.txtCustom formats where single-char delimiters conflictNoneApplication-specific escaping

Frequently Asked Questions

The parser reads the first 10 lines of the file and counts occurrences of candidate delimiters (comma, semicolon, tab, pipe). It then checks which candidate produces a consistent column count across all sampled rows. The candidate with the highest consistency score and frequency wins. If multiple candidates tie, priority follows the order: tab, comma, semicolon, pipe - matching the most common real-world conventions.
The RFC 4180 state machine correctly handles embedded newlines. When the parser enters the IN_QUOTED state upon encountering an opening double-quote, newline characters (\n or \r\n) are treated as literal field content rather than row terminators. The field only ends when a closing quote is followed by a delimiter or end-of-line. This means a field like "Line 1\nLine 2" is preserved as a single cell value in the output.
Yes. The converter supports arbitrary-length delimiter strings such as :: or |-| or even words like [SEP]. The quoting logic adapts: if any field contains the full multi-character delimiter as a substring, that field will be wrapped in quotes. Be aware that most standard tools (Excel, pandas read_csv) expect single-character delimiters, so multi-character separators are best suited for custom ETL pipelines.
Per RFC 4180, a literal double-quote inside a quoted field is represented as two consecutive double-quotes (""). The parser detects this pair in the IN_QUOTED state and collapses it to a single quote character in the parsed output. During re-serialization, if the field requires quoting (because it contains the output delimiter, a newline, or a quote), any internal quotes are re-escaped as double-quotes.
Yes. The parser normalizes line endings before processing. It recognizes \r\n (Windows/CRLF), \n (Unix/LF), and \r (legacy Mac/CR). All are treated as equivalent row terminators outside of quoted fields. The output uses \n (Unix LF) by default, which is universally accepted by modern systems.
Files under 500 KB are processed on the main thread for instant feedback. Files between 500 KB and approximately 50 MB are offloaded to a Web Worker to prevent UI freezing. The practical upper limit depends on your browser's available memory - typically 100-200 MB for modern browsers. For files exceeding this, consider splitting them with a command-line tool like the Unix split command first.
Inconsistent column counts almost always indicate a parsing error in the source file - typically an unescaped quote or delimiter inside a field that was not properly quoted. The converter reports row-level column count mismatches in the statistics panel. Check the flagged rows in your source data and ensure fields containing special characters are wrapped in double-quotes per RFC 4180.