About

Removing columns from a CSV file with a text editor is error-prone. One misplaced comma inside a quoted field breaks every downstream parser. This tool implements an RFC 4180-compliant parser that correctly handles quoted fields containing commas, newlines, and escaped double-quotes (""). It auto-detects delimiters (comma, semicolon, tab, pipe) by frequency analysis across the first 5 rows and preserves data integrity during column removal. All processing runs locally in your browser. No data leaves your machine.

Limitation: the tool assumes consistent column counts. Rows with fewer fields than the header are padded; rows with more are truncated to header length. Maximum file size is 100 MB. For files exceeding 5 MB, parsing offloads to a Web Worker to keep the interface responsive. The output preserves the original quoting style only where necessary (fields containing the delimiter, quotes, or newlines are re-quoted).

Formulas

Auto-delimiter detection scores each candidate delimiter by consistency of field counts across sampled rows:

score_d = count(mode(fields_d))N × mode(fields_d)

Where d is the candidate delimiter, fields_d is the array of field counts per row when split by d, mode returns the most frequent field count, count(mode) is how many rows match that mode, and N is the number of sampled rows. The delimiter with the highest score wins. Ties are broken by priority order: comma > semicolon > tab > pipe.

Column removal reconstructs each row by filtering indices not in the deletion set D:

row′ = [row[j] for j ∈ 0..m if j ∉ D]

Where m is total column count and D is the set of column indices selected for deletion. Re-serialization quotes any field containing the delimiter, a double-quote character, or a newline.

Reference Data

Delimiter	Symbol	Common File Extensions	Auto-Detected	Notes
Comma	,	.csv	Yes	RFC 4180 standard delimiter
Semicolon	;	.csv	Yes	Common in European locales where comma is decimal separator
Tab	\t	.tsv, .tab	Yes	Tab-separated values; rarely needs quoting
Pipe	\|	.csv, .psv	Yes	Used in medical (HL7) and financial data feeds
Space		.txt	No	Ambiguous; use manual override
Colon	:	.csv	No	Rare; conflicts with time formats
Double Quote Escape	""	-	-	Two consecutive quotes represent a literal quote inside a quoted field
BOM (Byte Order Mark)	U+FEFF	-	Stripped	UTF-8 BOM at file start is automatically removed
CRLF Line Ending	\r\n	-	Handled	Windows-style line endings normalized
LF Line Ending	\n	-	Handled	Unix/macOS line endings
CR Line Ending	\r	-	Handled	Legacy Mac OS (pre-OS X)
Quoted Newline	"a\nb"	-	Handled	Newlines inside quoted fields are preserved, not treated as row breaks
Empty Field	,,	-	-	Two consecutive delimiters produce an empty string field
Trailing Delimiter	a,b,	-	-	Trailing delimiter creates one extra empty field

Frequently Asked Questions

The parser implements RFC 4180 rules. When it encounters an opening double-quote, it enters a quoted-field state and treats all characters (including commas, newlines, and carriage returns) as field content until it finds a closing double-quote not followed by another double-quote. Two consecutive double-quotes ("") inside a quoted field are interpreted as a single literal quote character. This prevents comma-containing addresses or multi-line descriptions from being incorrectly split.

Rows with fewer fields than the detected header count are right-padded with empty strings. Rows with more fields than the header are truncated to match the header length. A warning toast is shown indicating the number of irregular rows found. This ensures column indices remain stable across all rows during deletion.

No field values are altered. The tool only removes entire columns by index. During re-serialization, fields are re-quoted only when necessary: if a field contains the output delimiter, a double-quote, or a newline character. Fields that were originally quoted but contain none of these characters are output unquoted, which is valid per RFC 4180 but may differ from the original file's quoting style.

The tool reads the first 5 rows and splits each by four candidate delimiters (comma, semicolon, tab, pipe). For each delimiter, it calculates a score based on how consistently it produces the same field count across rows, weighted by the number of fields. Override manually when your file uses an uncommon delimiter, has fewer than 2 rows, or when the data itself contains high frequencies of multiple delimiter characters (e.g., pipe-delimited data full of commas in quoted fields).

Maximum file size is 100 MB. Files under 5 MB are parsed on the main thread. Files between 5 and 100 MB are offloaded to a Web Worker to prevent the UI from freezing. A progress indicator is shown during processing. For files exceeding 100 MB, consider splitting the file first or using a command-line tool like csvkit.

The file is read as UTF-8 by default. A UTF-8 BOM (U+FEFF) at the start of the file is automatically stripped. If your file uses a different encoding (e.g., ISO-8859-1 or Windows-1252), characters outside the UTF-8 range may display as replacement characters. In such cases, re-save your file as UTF-8 before processing.