User Rating 0.0
Total Usage 0 times
Drop a file or click to upload .tsv, .csv, .txt, .dat — max 50 MB
Presets:
Is this tool helpful?

Your feedback helps us improve.

About

Tabular text data uses a single character - the delimiter - to mark column boundaries. The most common delimiters are the horizontal tab (\t, U+0009), comma (,), semicolon (;), and pipe (|). Choosing the wrong delimiter when importing data into a database, spreadsheet, or ETL pipeline silently corrupts column alignment: values shift right, numeric fields absorb text, and downstream queries return garbage. This tool performs real, field-aware conversion between any two delimiters. It implements RFC 4180 quoting rules: if a target delimiter or a double-quote character already exists inside a field value, the field is wrapped in double quotes and internal quotes are escaped as "". Auto-detection analyzes character frequency across the first 50 lines to identify the source delimiter without manual guessing.

Limitation: this tool treats each line as a flat record. It does not parse nested JSON within cells or handle multi-line quoted fields that span more than one physical line. For files exceeding 5 MB, processing is offloaded to a background thread to keep the interface responsive. Pro tip: if your source data uses a tab delimiter but was copy-pasted from a spreadsheet, verify that trailing tabs on short rows are preserved - some clipboard implementations strip them.

tsv to csv delimiter converter column separator tsv converter csv converter text delimiter pipe delimited tab separated

Formulas

Delimiter conversion follows a deterministic two-phase process: parse, then serialize. Each line of the input is split into an ordered field array, then rejoined with the target delimiter.

fields = split(line, dsrc)
output = join(quote(fields, dtgt), dtgt)

The quoting function applies RFC 4180 rules conditionally:

{
"field" if field contains dtgt " newlinefield otherwise

Auto-detection scores each candidate delimiter by counting occurrences per line and computing consistency:

score(d) = 11 + σ(counts) × count

Where dsrc = source delimiter, dtgt = target delimiter, σ = standard deviation of per-line occurrence counts (lower variance means more consistent column structure), and count = mean occurrences per line. The delimiter with the highest score wins.

Reference Data

Delimiter NameCharacterUnicodeCommon File ExtensionTypical Use CaseRFC / StandardQuoting Risk
Tab\tU+0009.tsv, .tabDatabase exports, UNIX utilitiesIANA text/tab-separated-valuesLow - rarely appears in data
Comma,U+002C.csvSpreadsheets, CRM exportsRFC 4180High - common in text & numbers
Semicolon;U+003B.csv (EU locale)European Excel, SAP exportsNo formal RFCMedium
Pipe|U+007C.psv, .datEDI, HL7 health data, mainframesHL7 v2.xLow
Colon:U+003A/etc/passwdUNIX config filesPOSIXMedium - in timestamps
Tilde~U+007E.datLegacy banking, NACHA filesNACHA/ACHVery low
Caret^U+005E.datMainframe flat filesNoneVery low
Space U+0020.txt, .ascFixed-width fallback, scientific logsNoneExtreme - appears everywhere
Unit SeparatorUSU+001F.datASCII control, binary-safe delimitingISO 646None - invisible character
Record SeparatorRSU+001E.datMulti-record ASCII streamsISO 646None
NullNULU+0000Binary streamsC-string termination, xargs -0POSIXNone
SOHSOHU+0001.hl7HL7 sub-component separatorHL7 v2.xNone
Double Pipe||Two U+007C.datCustom enterprise integrationsNoneVery low
Hash#U+0023.datLegacy telecom CDR filesNoneLow
At Sign@U+0040.datCustom log formatsNoneLow

Frequently Asked Questions

The tool samples the first 50 lines and counts occurrences of each candidate delimiter (tab, comma, semicolon, pipe, colon) per line. It then computes the mean count and standard deviation for each candidate. A consistent column structure produces low variance and a non-zero mean. The candidate with the highest score - defined as mean divided by (1 + standard deviation) - is selected. If all scores are zero or tied, tab is assumed as the default since the tool is TSV-focused.
The tool applies RFC 4180 quoting: any field containing the target delimiter, a double-quote character, or a newline is wrapped in double quotes. Existing double quotes within the field are escaped by doubling them (e.g., a field containing He said "hello" becomes "He said ""hello"""). This ensures the output can be re-parsed without ambiguity by any compliant parser.
Yes. Each line is split independently. If row 5 has 3 fields and row 6 has 7 fields, both are converted faithfully. The tool does not enforce rectangular structure. However, this inconsistency may indicate a parsing problem upstream - for example, a multi-line quoted field that was not properly handled by the source system. The tool reports the detected column count range in the output summary.
Minimally. The quoting check is an O(n) scan of each field value for the presence of the target delimiter or quote character. For a 100,000-line file with 10 columns, this adds roughly 1 million short string searches - completing in under 50 ms on modern hardware. The dominant cost is string concatenation for the output, which the tool optimizes by pre-allocating array joins rather than repeated string appends.
Commas appear frequently in natural text (addresses, descriptions, numbers with thousand separators in some locales). Every embedded comma forces quoting, inflating file size and complicating downstream parsing. Pipe (|) and tilde (~) rarely appear in real data, eliminating the need for quoting entirely. This is why EDI standards (ANSI X12), HL7 health records, and many mainframe systems chose non-comma delimiters decades ago.
The tool normalizes all line endings to the format you select: LF (Unix/Mac, \n), CRLF (Windows, \r\n), or CR (legacy Mac, \r). Input is split using a regex that matches any of these three patterns. Default output uses LF. If you plan to open the output in Windows Notepad (pre-2018 versions), select CRLF.