Change CSV Column Delimiter
Convert CSV column delimiters between comma, semicolon, tab, pipe, and custom characters. RFC 4180 compliant parser with auto-detection.
About
Switching a CSV file's column delimiter is deceptively error-prone. A naive find-and-replace corrupts any field that contains the delimiter character inside quoted text. This tool implements a full RFC 4180 state-machine parser that correctly handles quoted fields, escaped double-quotes (""), and embedded newlines before re-serializing with your target delimiter. It auto-detects the source delimiter by frequency analysis across the first 20 rows, counting only characters outside quoted regions.
Failure to handle quoting rules when changing delimiters is the most common cause of column-shift corruption in data pipelines. This is especially critical when moving data between European locale systems (semicolon-delimited) and US/international systems (comma-delimited). The tool re-applies minimal quoting on output: a field is quoted only when it contains the new delimiter, a double-quote, or a newline character. Approximation limits: files over 50 MB may cause browser memory pressure. For streaming workloads, a CLI tool such as csvkit is more appropriate.
Formulas
The parser operates as a finite state machine with three states:
Transition rules govern how each character c at position i is processed:
Auto-detection scores each candidate delimiter d by computing its occurrence count per row outside quoted regions. The delimiter with the lowest variance in per-row counts and count > 0 is selected:
On output, a field f is wrapped in quotes when:
Where dsrc is the source delimiter and dtarget is the target delimiter chosen by the user. Internal double-quotes are escaped as "" per RFC 4180.
Reference Data
| Delimiter | Name | Common Use | Escape Character | RFC/Standard | Typical File Extension | Locale Association | Risk Level (Collision) |
|---|---|---|---|---|---|---|---|
| , | Comma | General CSV | " (double-quote) | RFC 4180 | .csv | US, UK, International | High (numeric decimals) |
| ; | Semicolon | European CSV | " | De facto | .csv | DE, FR, IT, ES, BR | Low |
| \t | Tab | TSV files | " | IANA TSV | .tsv / .tab | Universal | Very Low |
| | | Pipe | Database exports | " or \ | None | .csv / .dat | Enterprise / Mainframe | Very Low |
| : | Colon | /etc/passwd, legacy | None standard | POSIX (passwd) | .dat / .txt | Unix/Linux | Medium (time values) |
| ^ | Caret | Fixed-width alternatives | " | None | .dat | Niche | Very Low |
| ~ | Tilde | EDI / Mainframe | None standard | X12 EDI | .edi / .dat | Enterprise | Very Low |
| \x01 | SOH | Binary-safe exports | None needed | None | .dat | Enterprise | None |
| Space | Space | Fixed-width text | " | None | .txt | Scientific data | Extreme |
| \x1F | Unit Separator | ASCII control char | None needed | ASCII | .dat | Legacy systems | None |
| # | Hash | Config files | " | None | .cfg / .dat | Niche | Medium (comments) |
| \\ | Backslash | Path data exports | " | None | .dat | Windows paths | High |