Change TSV Column Delimiter
Convert TSV files to CSV, pipe-delimited, or any custom separator. Handles quoted fields, auto-detects delimiters, and processes large files instantly.
About
Tabular text data uses a single character - the delimiter - to mark column boundaries. The most common delimiters are the horizontal tab (\t, U+0009), comma (,), semicolon (;), and pipe (|). Choosing the wrong delimiter when importing data into a database, spreadsheet, or ETL pipeline silently corrupts column alignment: values shift right, numeric fields absorb text, and downstream queries return garbage. This tool performs real, field-aware conversion between any two delimiters. It implements RFC 4180 quoting rules: if a target delimiter or a double-quote character already exists inside a field value, the field is wrapped in double quotes and internal quotes are escaped as "". Auto-detection analyzes character frequency across the first 50 lines to identify the source delimiter without manual guessing.
Limitation: this tool treats each line as a flat record. It does not parse nested JSON within cells or handle multi-line quoted fields that span more than one physical line. For files exceeding 5 MB, processing is offloaded to a background thread to keep the interface responsive. Pro tip: if your source data uses a tab delimiter but was copy-pasted from a spreadsheet, verify that trailing tabs on short rows are preserved - some clipboard implementations strip them.
Formulas
Delimiter conversion follows a deterministic two-phase process: parse, then serialize. Each line of the input is split into an ordered field array, then rejoined with the target delimiter.
The quoting function applies RFC 4180 rules conditionally:
Auto-detection scores each candidate delimiter by counting occurrences per line and computing consistency:
Where dsrc = source delimiter, dtgt = target delimiter, σ = standard deviation of per-line occurrence counts (lower variance means more consistent column structure), and = mean occurrences per line. The delimiter with the highest score wins.
Reference Data
| Delimiter Name | Character | Unicode | Common File Extension | Typical Use Case | RFC / Standard | Quoting Risk |
|---|---|---|---|---|---|---|
| Tab | \t | U+0009 | .tsv, .tab | Database exports, UNIX utilities | IANA text/tab-separated-values | Low - rarely appears in data |
| Comma | , | U+002C | .csv | Spreadsheets, CRM exports | RFC 4180 | High - common in text & numbers |
| Semicolon | ; | U+003B | .csv (EU locale) | European Excel, SAP exports | No formal RFC | Medium |
| Pipe | | | U+007C | .psv, .dat | EDI, HL7 health data, mainframes | HL7 v2.x | Low |
| Colon | : | U+003A | /etc/passwd | UNIX config files | POSIX | Medium - in timestamps |
| Tilde | ~ | U+007E | .dat | Legacy banking, NACHA files | NACHA/ACH | Very low |
| Caret | ^ | U+005E | .dat | Mainframe flat files | None | Very low |
| Space | U+0020 | .txt, .asc | Fixed-width fallback, scientific logs | None | Extreme - appears everywhere | |
| Unit Separator | US | U+001F | .dat | ASCII control, binary-safe delimiting | ISO 646 | None - invisible character |
| Record Separator | RS | U+001E | .dat | Multi-record ASCII streams | ISO 646 | None |
| Null | NUL | U+0000 | Binary streams | C-string termination, xargs -0 | POSIX | None |
| SOH | SOH | U+0001 | .hl7 | HL7 sub-component separator | HL7 v2.x | None |
| Double Pipe | || | Two U+007C | .dat | Custom enterprise integrations | None | Very low |
| Hash | # | U+0023 | .dat | Legacy telecom CDR files | None | Low |
| At Sign | @ | U+0040 | .dat | Custom log formats | None | Low |