CSV to PSV Converter
Convert CSV (Comma-Separated Values) to PSV (Pipe-Separated Values) online. Handles quoted fields, embedded commas, and newlines per RFC 4180.
About
Delimiter conversion errors silently corrupt datasets. A single unescaped comma inside a field shifts every subsequent column, cascading bad data through pipelines, reports, and database imports. This tool implements a full RFC 4180-compliant finite-state-machine parser that correctly handles quoted fields containing commas, literal newline characters (LF or CRLF), and escaped double-quotes (""). The output replaces the comma delimiter with the pipe character | (Unicode U+007C), preserving quoting only where structurally necessary. Fields that contain pipe characters in the source data are automatically quoted in the PSV output to maintain parse integrity.
The parser operates in O(n) time where n is character count. It does not use regular expressions for field splitting, which fail on multiline quoted fields. Limitation: this tool assumes UTF-8 encoding. Binary or non-text files will produce undefined output. Pro tip: always validate row-length consistency after conversion. A correct parse produces identical column counts across all rows.
Formulas
The CSV parser uses a four-state finite-state machine. Each input character c transitions the parser between states based on the current state S and the character class.
The output transformation replaces the delimiter character. For each parsed field fi, the PSV output applies:
Where S is the parser state, c is the current input character, fi is the i-th parsed field value, and quote wraps the field in double-quotes with internal quotes escaped as "". Time complexity is O(n) where n is total character count.
Reference Data
| Format | Delimiter | Unicode | Common Extension | RFC/Standard | Quoting Convention | Typical Use Case |
|---|---|---|---|---|---|---|
| CSV | , | U+002C | .csv | RFC 4180 | Double-quote (") | Spreadsheets, databases |
| PSV | | | U+007C | .psv / .txt | No formal RFC | Double-quote (") | EDI, HL7, mainframes |
| TSV | \t | U+0009 | .tsv / .tab | IANA text/tab-separated-values | Rarely quoted | Bioinformatics, linguistics |
| SSV (Space) | U+0020 | .txt | None | Varies | Fixed-width legacy systems | |
| Semicolon-SV | ; | U+003B | .csv | None (European locale CSV) | Double-quote (") | European Excel exports |
| Colon-SV | : | U+003A | .txt | None | Varies | /etc/passwd, config files |
| Caret-SV | ^ | U+005E | .txt | None | Varies | Legacy data interchange |
| Tilde-SV | ~ | U+007E | .txt | None | Varies | EDI X12 segments |
| JSON Lines | Newline | U+000A | .jsonl | jsonlines.org | N/A (structured) | Log streaming, ML datasets |
| Fixed Width | Column positions | N/A | .dat / .txt | Varies per schema | None | COBOL, mainframe batch |
| ASCII Unit Sep | US | U+001F | .txt | ASCII control chars | None needed | High-reliability interchange |
| ASCII Record Sep | RS | U+001E | .txt | ASCII control chars | None needed | Multi-record binary streams |