About

Changing a CSV delimiter appears trivial until a quoted field contains the delimiter character itself. A naive find-and-replace corrupts the data. This tool implements a full RFC 4180-compliant state machine parser that tracks whether the scanner is inside a quoted region, correctly handling escaped quotes (""), embedded newlines, and Unicode BOM prefixes. The auto-detect algorithm samples the first 20 lines outside quoted regions and selects the candidate delimiter with the lowest variance in per-line occurrence count. Typical failure scenario: importing a European-locale CSV (semicolon-delimited) into a system expecting commas destroys numeric columns where commas serve as decimal separators.

The converter re-serializes each field, applying quoting only when the target delimiter, a double-quote, or a newline appears within the field value. This minimizes unnecessary quoting and keeps output files compact. Limitation: this tool treats the input as plain text. It does not validate data types, detect encoding beyond UTF-8/ASCII, or handle fixed-width formats. For files exceeding 1 MB, processing moves to a background thread to prevent UI blocking.

Formulas

The parser operates as a finite state machine with three states:

{

S₀ = FIELD_START - expecting field content or opening quoteS₁ = UNQUOTED - reading until delimiter or newlineS₂ = QUOTED - reading until unescaped closing quote

Delimiter auto-detection scores each candidate d by computing the variance σ² of per-line counts across the sample:

σ²_d = 1n n∑i=1 (c_i − c)²

Where c_i is the count of delimiter d on line i (outside quotes), and c is the mean count. The candidate with the lowest σ² and c ≥ 1 is selected. Ties are broken by priority order: , > ; > \t > |.

Re-serialization rule: a field f is quoted in the output if and only if f contains the target delimiter, a double-quote character, or a newline character.

Reference Data

Delimiter	Symbol	Common Name	Typical Use Case	RFC 4180	Risk with Naive Replace
Comma	,	CSV	US/UK locale exports, most APIs	Yes (default)	Breaks European decimals (3,14)
Semicolon	;	CSV (EU)	Excel exports in EU locales	No (extension)	Rare in field data
Tab	\t	TSV	Database exports, bioinformatics	No	Invisible character confusion
Pipe	\|	PSV	Legacy mainframe systems, HL7	No	Rare in natural text
Colon	:	Colon-SV	/etc/passwd, some configs	No	Breaks time values (14:30)
Space		SSV	Fixed-width approximations	No	Breaks multi-word fields
Caret	^	Caret-SV	EDI, some financial feeds	No	Low risk
Tilde	~	Tilde-SV	Custom enterprise exports	No	Low risk
Unit Separator	US (0x1F)	ASCII 31	Machine-to-machine data	No	Not human-readable
SOH	SOH (0x01)	ASCII 1	FIX protocol tag separator	No	Not human-readable
Double Quote	"	Quote char	Field enclosure (not delimiter)	Yes (enclosure)	Must be escaped as ""
Newline	\n / \r\n	Row separator	Record boundary	Yes (CRLF)	Embedded newlines in quotes

Frequently Asked Questions

The auto-detect scanner tracks a quoted-state flag. When it encounters an unescaped double-quote, it toggles the flag. Only commas (or any candidate delimiter) encountered while the flag is off are counted. This means commas inside properly quoted fields like "New York, NY" are excluded from the frequency analysis. If your file has inconsistent quoting, auto-detection may fail - in that case, manually select the source delimiter.

The converter treats all fields as opaque text strings. A field containing "3,14" will be correctly preserved and wrapped in quotes in the output: "3,14". The comma inside the field is not interpreted as a delimiter because the re-serializer detects that the field value contains the target delimiter character and applies RFC 4180 quoting automatically.

The custom delimiter input accepts a single character only. Multi-character delimiters are not part of the CSV specification and require a different parsing strategy (essentially a different file format). If you need multi-character delimiter support, pre-process the file to replace the multi-char sequence with a single unused character (such as the ASCII Unit Separator 0x1F), then use this tool.

This typically occurs when the source file contains newline characters (\n or \r\n) embedded inside quoted fields. If you selected the wrong source delimiter, the parser may not correctly identify field boundaries, causing quoted newlines to be treated as row separators. Verify the detected source delimiter matches your file. Also check that all quotes in the original file are properly escaped as double-quotes ("").

The tool accepts files up to 50 MB. Files larger than 1 MB are processed in a Web Worker (background thread) to keep the UI responsive, with a progress indicator. The preview table always shows only the first 10 rows regardless of file size. For files approaching 50 MB, ensure your browser tab has sufficient memory - Chrome typically allows 1-4 GB per tab.

The tool reads files as UTF-8 text via the FileReader API. A UTF-8 BOM (byte order mark, 0xEF 0xBB 0xBF) at the start of the file is automatically stripped. Non-UTF-8 encodings (e.g., ISO-8859-1, Shift-JIS) may produce garbled characters. Convert your file to UTF-8 before using this tool if you see encoding artifacts in the preview.