User Rating 0.0
Total Usage 0 times
Quick presets:
Input
📄 Drop a file here or click to upload
Output
Is this tool helpful?

Your feedback helps us improve.

About

Switching column delimiters in structured text is deceptively error-prone. A naive find-and-replace fails when the source delimiter appears inside quoted fields, when consecutive delimiters represent empty columns, or when mixed line endings corrupt row boundaries. Mishandling these edge cases corrupts datasets - merging columns, shifting values, or silently dropping fields. This tool implements RFC 4180-aware parsing that respects double-quoted fields: a comma inside "New York, NY" is preserved literally, not treated as a column break. It auto-detects the input delimiter by analyzing character frequency consistency across the first 20 rows.

The converter handles tab (\t), comma, semicolon, pipe (|), colon, space, and arbitrary multi-character delimiters. Options for trimming cell whitespace and collapsing consecutive delimiters give fine control over output formatting. Limitation: this tool processes plain-text columnar data. It does not parse binary formats like .xlsx or fixed-width column layouts. Pro tip: European CSV files often use semicolons because the comma serves as a decimal separator in those locales.

delimiter converter csv to tsv column separator text formatting delimiter changer csv converter tsv converter

Formulas

The core operation is a parse-then-serialize pipeline. Each row of input text is tokenized respecting quoted fields, then re-joined with the target delimiter.

output = join(parse(row, dsrc), dtgt)

Where dsrc is the source delimiter and dtgt is the target delimiter. The parse function implements a finite state machine with three states:

S {FIELD_START, IN_QUOTED, IN_UNQUOTED}

When S = IN_QUOTED, delimiter characters are treated as literal content. A double-quote inside a quoted field is escaped as "" per RFC 4180. Auto-detection scores each candidate delimiter dc by computing column count variance across sampled rows:

score(dc) = 1σ2(counts) + 1 × count

The delimiter with the highest score (lowest variance and highest mean count) wins. A perfect delimiter produces identical column counts on every row, yielding σ2 = 0.

Reference Data

DelimiterSymbolEscape SequenceCommon FormatRFC / StandardTypical Use Case
Comma,Quoted fieldCSVRFC 4180Spreadsheets, data export
Tab\tRarely neededTSVIANA TSVBioinformatics, database dumps
Semicolon;Quoted fieldCSV (EU)De factoEuropean locale CSV
Pipe|Backslash or quotePSVHL7, EDIHealthcare, legacy systems
Colon:Backslash/etc/passwdPOSIXUnix config files
SpaceQuoted fieldSSVNoneLog files, CLI output
Tilde~None standardCustomNoneLegacy mainframe exports
Caret^None standardCustomNoneSpecial data feeds
Double Pipe||None standardCustomNoneMulti-char delimited logs
SOH (\x01)^AN/AFIX ProtocolFIX 4.xFinancial trading messages
Unit Sep (\x1F)USN/AASCII delimitedISO 646Data interchange
Null (\0)NULN/Axargs -0POSIXFilenames with spaces
Hash#None standardCustomNoneColor codes, config
Ampersand&None standardQuery stringRFC 3986URL parameters
Equals=URL encodingKey-valueRFC 3986Config files, env vars

Frequently Asked Questions

The parser implements a finite state machine per RFC 4180. When it encounters an opening double-quote at field start, it enters the IN_QUOTED state. All characters - including the source delimiter - are treated as literal field content until a closing quote is found. A literal quote inside a quoted field must be escaped as two consecutive quotes (""). This means a field like "Smith, John" preserves the comma as data, not as a column separator.
By default, consecutive delimiters are preserved as empty fields. For example, the input a,,c with comma delimiter produces three fields: a, (empty), and c. If you enable the "Collapse consecutive delimiters" option, adjacent delimiters are merged into one, which is useful for space-separated log files where multiple spaces align columns visually but represent a single separator.
Yes. Select "Custom" from the delimiter dropdown and type any string - including multi-character sequences like || or :: or even words like DELIM. The parser splits on the exact string match. Note that multi-character delimiters cannot be auto-detected; you must specify them manually.
Auto-detection samples the first 20 rows and tests candidates: comma, tab, semicolon, pipe, colon, and space. It calculates the variance of column counts per candidate. A delimiter that produces consistent column counts (variance near zero) with at least 2 columns scores highest. It works reliably for well-structured data but may fail on single-column input, heavily irregular files, or when multiple candidates produce identical scores. In ambiguous cases, comma is preferred as the tiebreaker per RFC 4180 prevalence.
Yes. After parsing, the serializer checks each field for the presence of the target delimiter, double-quotes, or newline characters. If any are found, the field is wrapped in double-quotes and internal quotes are escaped as "". This ensures the output is valid for re-import into any RFC 4180-compliant parser.
The tool processes text entirely in browser memory. Practical limits depend on the device - typically 50-200 MB of raw text on modern desktops. For files exceeding roughly 50,000 lines, the tool uses chunked processing with UI progress feedback to prevent the browser from becoming unresponsive. There is no server upload; all processing is local and private.