User Rating 0.0 ★★★★★

Total Usage 0 times

Category JSON Tools

Delimiter: First row is header

Drop a CSV file here, paste text, or click Open File Supports .csv, .tsv, .tab, .txt

Is this tool helpful?

Your feedback helps us improve.

★ ★ ★ ★ ★

About

Misread CSV data costs hours of debugging and can propagate errors through entire data pipelines. A raw text editor cannot distinguish between a comma inside a quoted field and a delimiter comma. This tool implements RFC 4180 parsing with full support for quoted fields, embedded newlines, and escaped double-quotes (""). It auto-detects delimiters across comma, semicolon (;), tab (\t), and pipe (|) by frequency analysis of the first 10 lines. All processing runs locally in your browser. No data leaves your machine.

Sorting uses natural order comparison so that item2 sorts before item10. Column statistics compute sum, avg, min, max for numeric columns and unique value counts for text. Limitation: files exceeding ~50 MB may cause browser memory pressure on low-end devices. For datasets beyond 100,000 rows, a database tool is more appropriate.

Formulas

Delimiter auto-detection scores each candidate by occurrence frequency across the first 10 rows:

score_d = n∑i=1 count(d, row_i)n

where d is the candidate delimiter, n is the number of sample rows (up to 10), and the delimiter with the highest consistent score and lowest variance across rows wins. Consistency is measured by standard deviation of per-row counts:

σ_d = √n∑i=1 (c_i − c)²n

where c_i is the count of delimiter d in row i, and c is the mean count. The delimiter with the lowest σ and score ≥ 1 is selected. Natural sort comparison splits strings into numeric and alphabetic chunks, comparing numeric chunks by value and alphabetic chunks lexicographically.

Reference Data

Delimiter	Symbol	Common File Extension	Typical Use Case	RFC Standard
Comma	,	.csv	General data exchange	RFC 4180
Semicolon	;	.csv	European locale (comma = decimal)	No formal RFC
Tab	\t	.tsv / .tab	Bioinformatics, spreadsheets	IANA text/tab-separated-values
Pipe	\|	.csv / .txt	Legacy mainframe exports	No formal RFC
Double-quote escape	""	-	Embedding quotes in fields	RFC 4180 §2.7
CRLF line ending	\r\n	-	Windows-origin files	RFC 4180 §2.1
LF line ending	\n	-	Unix/macOS-origin files	De facto standard
BOM marker	U+FEFF	-	UTF-8 with BOM (Excel export)	Unicode §23.8
Header row	-	-	First row as column names	RFC 4180 §2.3 (optional)
Empty field	,,	-	Missing / null data	RFC 4180 §2.6
Newline in field	"a\nb"	-	Multi-line cell content	RFC 4180 §2.6
UTF-8 encoding	-	-	International characters	RFC 3629

Frequently Asked Questions

The parser samples the first 10 rows and counts occurrences of each candidate delimiter (comma, semicolon, tab, pipe). The candidate with the highest average count and lowest variance wins. It can fail on files with fewer than 2 rows, files where multiple delimiters appear equally often, or single-column files with no delimiters. In ambiguous cases, switch to manual delimiter selection.

Yes. The parser follows RFC 4180: fields wrapped in double-quotes preserve embedded commas and newlines as literal content. Escaped double-quotes (two consecutive double-quote characters within a quoted field) are collapsed to a single double-quote. This is the same logic Excel uses when exporting CSV.

Short rows are padded with empty strings to match the header column count. Long rows retain all their values - extra columns appear as unnamed columns (Column N+1, etc.). A warning toast appears if row length inconsistency exceeds 5% of total rows.

Sorting uses natural order comparison. A column with values like "item1", "item10", "item2" sorts as item1 → item2 → item10, not item1 → item10 → item2. Pure numeric columns are compared by floating-point value. Empty cells always sort last regardless of direction.

No hard limit is enforced. Practical limits depend on browser memory. Files up to ~30 MB and ~200,000 rows work smoothly. Beyond that, pagination keeps the DOM manageable but initial parsing may take several seconds. A progress indicator appears during parsing. For files over 100 MB, consider a desktop tool like LibreOffice or a database import.

Each cell is tested with parseFloat. If more than 50% of non-empty cells in a column parse as valid numbers, the column is treated as numeric and sum, average, min, max are computed. Otherwise, only row count and unique value count are shown. NaN, Infinity, and empty cells are excluded from numeric aggregation.