About

Delimiter conversion errors silently corrupt datasets. A single unescaped comma inside a field shifts every subsequent column, cascading bad data through pipelines, reports, and database imports. This tool implements a full RFC 4180-compliant finite-state-machine parser that correctly handles quoted fields containing commas, literal newline characters (LF or CRLF), and escaped double-quotes (""). The output replaces the comma delimiter with the pipe character | (Unicode U+007C), preserving quoting only where structurally necessary. Fields that contain pipe characters in the source data are automatically quoted in the PSV output to maintain parse integrity.

The parser operates in O(n) time where n is character count. It does not use regular expressions for field splitting, which fail on multiline quoted fields. Limitation: this tool assumes UTF-8 encoding. Binary or non-text files will produce undefined output. Pro tip: always validate row-length consistency after conversion. A correct parse produces identical column counts across all rows.

Formulas

The CSV parser uses a four-state finite-state machine. Each input character c transitions the parser between states based on the current state S and the character class.

{

S → FIELD_START at row/field beginS → IN_UNQUOTED if c ≠ "S → IN_QUOTED if c = "S → QUOTE_IN_QUOTED if c = " inside quoted field

The output transformation replaces the delimiter character. For each parsed field f_i, the PSV output applies:

output = quote(f_i) if f_i contains | or " or newline

Where S is the parser state, c is the current input character, f_i is the i-th parsed field value, and quote wraps the field in double-quotes with internal quotes escaped as "". Time complexity is O(n) where n is total character count.

Reference Data

Format	Delimiter	Unicode	Common Extension	RFC/Standard	Quoting Convention	Typical Use Case
CSV	,	U+002C	.csv	RFC 4180	Double-quote (")	Spreadsheets, databases
PSV	\|	U+007C	.psv / .txt	No formal RFC	Double-quote (")	EDI, HL7, mainframes
TSV	\t	U+0009	.tsv / .tab	IANA text/tab-separated-values	Rarely quoted	Bioinformatics, linguistics
SSV (Space)		U+0020	.txt	None	Varies	Fixed-width legacy systems
Semicolon-SV	;	U+003B	.csv	None (European locale CSV)	Double-quote (")	European Excel exports
Colon-SV	:	U+003A	.txt	None	Varies	/etc/passwd, config files
Caret-SV	^	U+005E	.txt	None	Varies	Legacy data interchange
Tilde-SV	~	U+007E	.txt	None	Varies	EDI X12 segments
JSON Lines	Newline	U+000A	.jsonl	jsonlines.org	N/A (structured)	Log streaming, ML datasets
Fixed Width	Column positions	N/A	.dat / .txt	Varies per schema	None	COBOL, mainframe batch
ASCII Unit Sep	US	U+001F	.txt	ASCII control chars	None needed	High-reliability interchange
ASCII Record Sep	RS	U+001E	.txt	ASCII control chars	None needed	Multi-record binary streams

Frequently Asked Questions

If a source CSV field contains the pipe character (U+007C), the converter wraps that field in double-quotes in the PSV output. Any existing double-quotes inside the field are escaped as two consecutive double-quotes (""). This prevents the embedded pipe from being misinterpreted as a field delimiter during downstream parsing.

The parser correctly preserves literal newline characters (LF or CRLF) found within double-quoted fields, per RFC 4180 Section 2 Rule 6. These newlines appear in the PSV output within a quoted field. Naive line-by-line parsers will break on such data; this FSM-based parser does not.

The tool defaults to comma as the input delimiter. European locale exports from Excel often use semicolons because the comma is the decimal separator in those locales. You can paste semicolon-delimited data and it will be treated as single-field-per-row unless you pre-process it. For best results, ensure your source file uses commas or re-export from your spreadsheet with comma delimiters.

The converter processes files up to 50 MB in the browser. Files are read entirely into memory via the FileReader API. For datasets exceeding 50 MB, consider splitting the file or using a command-line tool like awk: awk -F',' -v OFS='|' "{$1=$1; print}" input.csv > output.psv. Note that this awk approach does not handle quoted fields correctly.

Check row-length consistency: every row in valid PSV should have exactly N-1 pipe delimiters where N is the column count from the header row. The converter displays row count and column count after conversion. If your source CSV has R rows and C columns, the output should report the same values. Mismatched column counts indicate malformed source data.

Yes. An empty field between two commas (,,) produces an empty string field in the output, delimited by pipes (||). A trailing comma at the end of a row produces an additional empty field, matching RFC 4180 behavior. The parser does not silently drop empty trailing fields.