User Rating 0.0 β˜…β˜…β˜…β˜…β˜…
Total Usage 0 times
Drop a CSV file here or click to browse Supports .csv, .tsv, .txt up to 50 MB
Source Delimiter
Target Delimiter
Is this tool helpful?

Your feedback helps us improve.

β˜… β˜… β˜… β˜… β˜…

About

Switching a CSV file's column delimiter is deceptively error-prone. A naive find-and-replace corrupts any field that contains the delimiter character inside quoted text. This tool implements a full RFC 4180 state-machine parser that correctly handles quoted fields, escaped double-quotes (""), and embedded newlines before re-serializing with your target delimiter. It auto-detects the source delimiter by frequency analysis across the first 20 rows, counting only characters outside quoted regions.

Failure to handle quoting rules when changing delimiters is the most common cause of column-shift corruption in data pipelines. This is especially critical when moving data between European locale systems (semicolon-delimited) and US/international systems (comma-delimited). The tool re-applies minimal quoting on output: a field is quoted only when it contains the new delimiter, a double-quote, or a newline character. Approximation limits: files over 50 MB may cause browser memory pressure. For streaming workloads, a CLI tool such as csvkit is more appropriate.

csv delimiter converter csv separator changer tsv to csv csv to tsv delimiter converter csv column separator csv tool

Formulas

The parser operates as a finite state machine with three states:

S ∈ { UNQUOTED, QUOTED, QUOTE_ESCAPE }

Transition rules govern how each character c at position i is processed:

{
S β†’ QUOTED if c = " and field startS β†’ QUOTE_ESCAPE if c = " and S = QUOTEDemit field, advance row if c = \n and S β‰  QUOTEDemit field if c = dsrc and S β‰  QUOTED

Auto-detection scores each candidate delimiter d by computing its occurrence count per row outside quoted regions. The delimiter with the lowest variance in per-row counts and count > 0 is selected:

dbest = argmind Οƒ2(countsd)

On output, a field f is wrapped in quotes when:

needsQuote(f) = TRUE if f contains dtarget ∨ " ∨ \n ∨ \r

Where dsrc is the source delimiter and dtarget is the target delimiter chosen by the user. Internal double-quotes are escaped as "" per RFC 4180.

Reference Data

DelimiterNameCommon UseEscape CharacterRFC/StandardTypical File ExtensionLocale AssociationRisk Level (Collision)
,CommaGeneral CSV" (double-quote)RFC 4180.csvUS, UK, InternationalHigh (numeric decimals)
;SemicolonEuropean CSV"De facto.csvDE, FR, IT, ES, BRLow
\tTabTSV files"IANA TSV.tsv / .tabUniversalVery Low
|PipeDatabase exports" or \None.csv / .datEnterprise / MainframeVery Low
:Colon/etc/passwd, legacyNone standardPOSIX (passwd).dat / .txtUnix/LinuxMedium (time values)
^CaretFixed-width alternatives"None.datNicheVery Low
~TildeEDI / MainframeNone standardX12 EDI.edi / .datEnterpriseVery Low
\x01SOHBinary-safe exportsNone neededNone.datEnterpriseNone
SpaceSpaceFixed-width text"None.txtScientific dataExtreme
\x1FUnit SeparatorASCII control charNone neededASCII.datLegacy systemsNone
#HashConfig files"None.cfg / .datNicheMedium (comments)
\\BackslashPath data exports"None.datWindows pathsHigh

Frequently Asked Questions

The parser uses a finite state machine with three states: UNQUOTED, QUOTED, and QUOTE_ESCAPE. When inside the QUOTED state, delimiter characters are treated as literal field content and never trigger a column split. This is compliant with RFC 4180 Section 2, Rule 6. A naive split() approach would corrupt such fields.
During re-serialization, every field is checked against the target delimiter. If a field contains the target delimiter character, a double-quote, or a newline, it is automatically wrapped in double-quotes. Any existing double-quotes within the field are escaped by doubling them (""). This ensures zero data corruption regardless of field content.
Auto-detection parses the first 20 rows using a quote-aware scanner. It counts occurrences of each candidate delimiter only when the parser state is UNQUOTED. It then computes the variance of per-row counts. A true delimiter produces a consistent count across rows (low variance), while commas inside quoted numeric fields are invisible to the counter. The candidate with the lowest non-zero variance wins.
Yes. The parser emits every field including empty strings between consecutive delimiters and at the end of rows. If a row ends with three tabs, three empty fields are preserved. The re-serializer writes the exact number of delimiters to maintain column count parity. Row-to-row column count consistency is preserved.
Practical limit is approximately 50 MB. The FileReader API loads the entire file into a JavaScript string, consuming roughly 2x the file size in memory (UTF-16 encoding). For files exceeding this, the browser tab may run out of memory. The tool displays a warning for files over 10 MB and blocks files over 50 MB with an error message recommending a CLI tool like csvkit or Miller.
The parser normalizes all line endings. It treats \r\n (Windows), \r (old Mac), and \n (Unix) identically as row terminators when outside a quoted field. On output, the tool uses \n (Unix LF) by default. Newlines embedded inside quoted fields are preserved exactly as they appear in the source data.
The current implementation supports single-character delimiters only. Multi-character delimiters are not part of RFC 4180 and introduce ambiguity in quoting rules. If your data uses "||", consider first replacing "||" with a single unused character (such as the ASCII Unit Separator \x1F) using a text editor, then processing through this tool.