User Rating 0.0 ★★★★★

Total Usage 0 times

Category Text Formatting

Quick presets:

Source Delimiter

Target Delimiter

Trim cell whitespace Collapse consecutive delimiters Skip empty lines Re-quote fields if needed

Input

📄 Drop a file here or click to upload

Output

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Switching column delimiters in structured text is deceptively error-prone. A naive find-and-replace fails when the source delimiter appears inside quoted fields, when consecutive delimiters represent empty columns, or when mixed line endings corrupt row boundaries. Mishandling these edge cases corrupts datasets - merging columns, shifting values, or silently dropping fields. This tool implements RFC 4180-aware parsing that respects double-quoted fields: a comma inside "New York, NY" is preserved literally, not treated as a column break. It auto-detects the input delimiter by analyzing character frequency consistency across the first 20 rows.

The converter handles tab (\t), comma, semicolon, pipe (|), colon, space, and arbitrary multi-character delimiters. Options for trimming cell whitespace and collapsing consecutive delimiters give fine control over output formatting. Limitation: this tool processes plain-text columnar data. It does not parse binary formats like .xlsx or fixed-width column layouts. Pro tip: European CSV files often use semicolons because the comma serves as a decimal separator in those locales.

Formulas

The core operation is a parse-then-serialize pipeline. Each row of input text is tokenized respecting quoted fields, then re-joined with the target delimiter.

output = join(parse(row, d_src), d_tgt)

Where d_src is the source delimiter and d_tgt is the target delimiter. The parse function implements a finite state machine with three states:

S ∈ {FIELD_START, IN_QUOTED, IN_UNQUOTED}

When S = IN_QUOTED, delimiter characters are treated as literal content. A double-quote inside a quoted field is escaped as "" per RFC 4180. Auto-detection scores each candidate delimiter d_c by computing column count variance across sampled rows:

score(d_c) = 1σ²(counts) + 1 × count

The delimiter with the highest score (lowest variance and highest mean count) wins. A perfect delimiter produces identical column counts on every row, yielding σ² = 0.

Reference Data

Delimiter	Symbol	Escape Sequence	Common Format	RFC / Standard	Typical Use Case
Comma	,	Quoted field	CSV	RFC 4180	Spreadsheets, data export
Tab	\t	Rarely needed	TSV	IANA TSV	Bioinformatics, database dumps
Semicolon	;	Quoted field	CSV (EU)	De facto	European locale CSV
Pipe	\|	Backslash or quote	PSV	HL7, EDI	Healthcare, legacy systems
Colon	:	Backslash	/etc/passwd	POSIX	Unix config files
Space	␣	Quoted field	SSV	None	Log files, CLI output
Tilde	~	None standard	Custom	None	Legacy mainframe exports
Caret	^	None standard	Custom	None	Special data feeds
Double Pipe	\|\|	None standard	Custom	None	Multi-char delimited logs
SOH (\x01)	^A	N/A	FIX Protocol	FIX 4.x	Financial trading messages
Unit Sep (\x1F)	US	N/A	ASCII delimited	ISO 646	Data interchange
Null (\0)	NUL	N/A	xargs -0	POSIX	Filenames with spaces
Hash	#	None standard	Custom	None	Color codes, config
Ampersand	&	None standard	Query string	RFC 3986	URL parameters
Equals	=	URL encoding	Key-value	RFC 3986	Config files, env vars

Frequently Asked Questions

The parser implements a finite state machine per RFC 4180. When it encounters an opening double-quote at field start, it enters the IN_QUOTED state. All characters - including the source delimiter - are treated as literal field content until a closing quote is found. A literal quote inside a quoted field must be escaped as two consecutive quotes (""). This means a field like "Smith, John" preserves the comma as data, not as a column separator.

By default, consecutive delimiters are preserved as empty fields. For example, the input a,,c with comma delimiter produces three fields: a, (empty), and c. If you enable the "Collapse consecutive delimiters" option, adjacent delimiters are merged into one, which is useful for space-separated log files where multiple spaces align columns visually but represent a single separator.

Yes. Select "Custom" from the delimiter dropdown and type any string - including multi-character sequences like || or :: or even words like DELIM. The parser splits on the exact string match. Note that multi-character delimiters cannot be auto-detected; you must specify them manually.

Auto-detection samples the first 20 rows and tests candidates: comma, tab, semicolon, pipe, colon, and space. It calculates the variance of column counts per candidate. A delimiter that produces consistent column counts (variance near zero) with at least 2 columns scores highest. It works reliably for well-structured data but may fail on single-column input, heavily irregular files, or when multiple candidates produce identical scores. In ambiguous cases, comma is preferred as the tiebreaker per RFC 4180 prevalence.

Yes. After parsing, the serializer checks each field for the presence of the target delimiter, double-quotes, or newline characters. If any are found, the field is wrapped in double-quotes and internal quotes are escaped as "". This ensures the output is valid for re-import into any RFC 4180-compliant parser.

The tool processes text entirely in browser memory. Practical limits depend on the device - typically 50-200 MB of raw text on modern desktops. For files exceeding roughly 50,000 lines, the tool uses chunked processing with UI progress feedback to prevent the browser from becoming unresponsive. There is no server upload; all processing is local and private.