User Rating 0.0
Total Usage 0 times
Drop a CSV file here or click to browse Supports .csv, .tsv, .txt up to 50 MB
or paste CSV text
Is this tool helpful?

Your feedback helps us improve.

About

Misaligned columns in CSV data cause silent failures in ETL pipelines, database imports, and analytics dashboards. A single unescaped delimiter inside a field shifts every subsequent column, producing corrupt records that pass validation but yield wrong results. This tool parses your CSV using a strict RFC 4180 state machine that correctly handles quoted fields, escaped double-quotes (""), and embedded newlines. It reports the column count per row, flags inconsistencies where row i has ni n1, and auto-detects the delimiter from comma, semicolon, tab, and pipe characters.

The tool approximates header presence by checking whether the first row contains exclusively non-numeric strings while subsequent rows contain mixed or numeric data. Limitation: auto-detection fails on files where every field is text or where multiple candidate delimiters appear with equal frequency. In such cases, select the delimiter manually. Files up to 50 MB are supported client-side with no server upload.

csv columns csv parser csv analyzer delimiter data analysis column count

Formulas

Column counting follows a finite-state parser. For each row, the parser transitions between states based on the current character and the active state. The column count for row i is:

colsi = delimitersi + 1

where delimitersi counts only unquoted delimiter characters in row i. The delimiter auto-detection score for candidate d is computed as:

score(d) = consistency(d) × frequency(d)

where consistency measures what fraction of sampled rows produce the same column count, and frequency is the mean count of d per row. The candidate with the highest score is selected. Inconsistency is flagged when:

colsi mode(cols1, cols2, …, colsN)

where N is total row count, mode returns the most frequent column count, colsi is the column count for row i, and rows deviating from the mode are reported as inconsistent.

Reference Data

DelimiterNameCommon UseUnicodeRFC StandardRisk Factor
,CommaInternational CSV defaultU+002CRFC 4180Breaks on European decimals (3,14)
;SemicolonEuropean CSV (Excel EU locale)U+003BNon-standardRare in field data
\tTabTSV files, database exportsU+0009IANA TSVInvisible character, hard to debug
|PipeLegacy systems, HL7 medical dataU+007CNon-standardConflicts with shell piping
\x1FUnit SeparatorASCII control characterU+001FNon-standardNot human-readable
Common Column Count Expectations
Standard Address File5 - 8 columnsName, Street, City, State, Zip, Country
Bank Transaction Export6 - 12 columnsDate, Description, Debit, Credit, Balance, Reference
Web Analytics (GA Export)10 - 30 columnsSession, Source, Medium, Page, Bounce Rate, etc.
eCommerce Product Feed15 - 50 columnsSKU, Title, Description, Price, Images, Variants
Scientific Dataset (Tidy)3 - 20 columnsObservation, Variable, Value per tidy data principles
US Census PUMS200+ columnsMicrodata with coded variables
Apache Log (CSV-converted)7 - 9 columnsIP, Timestamp, Method, URL, Status, Size, Referrer
CRM Contact Export20 - 40 columnsName, Email, Phone, Company, Tags, Custom Fields
IoT Sensor Readings4 - 15 columnsTimestamp, Sensor ID, Value, Unit, Status
Genomics VCF (tab-delimited)8+ fixed + n samplesCHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO

Frequently Asked Questions

The parser implements a 4-state finite automaton per RFC 4180. When a double-quote opens a field, all characters including delimiters and line breaks are treated as field content until a closing quote is found. A literal quote inside a quoted field must be escaped as two consecutive double-quotes (""). This means a field like "San Francisco, CA" counts as one column, not two, even with a comma delimiter.
Auto-detection samples the first 20 rows and scores each candidate delimiter by consistency (same column count across rows) multiplied by frequency. If your file uses an uncommon delimiter or has irregular structure in the first 20 rows, the heuristic may fail. In that case, manually select the correct delimiter from the dropdown. Files with only one column or where every field is quoted with embedded delimiters are inherently ambiguous.
Common causes include: unescaped delimiters inside field values (e.g., commas in addresses without quoting), missing trailing delimiters on some rows, extra blank fields appended by spreadsheet software, or corrupted lines from truncated writes. The tool flags each row that deviates from the mode column count so you can inspect the specific problematic lines.
The FileReader API reads the file as UTF-8 text by default. UTF-8 BOM (byte order mark, 0xEF 0xBB 0xBF) is stripped automatically before parsing. If your file uses Latin-1 or Windows-1252 encoding, special characters may display incorrectly, but delimiter detection and column counting remain accurate since delimiter characters fall within the ASCII range (U+0000 to U+007F).
Files up to 50 MB are accepted. For files exceeding 1 MB, parsing is chunked using setTimeout batches of 100,000 characters to prevent the browser main thread from freezing. A progress indicator shows parsing advancement. Memory is released after analysis completes. For files larger than 50 MB, consider splitting with a command-line tool like GNU split before analysis.
The heuristic checks whether the first row consists entirely of non-numeric, non-empty strings while at least 30% of cells in rows 2 through 6 contain numeric or date-like values. This is a probabilistic guess. If your data has text-only columns or numeric headers (e.g., year codes), the detection may be incorrect. You can override it manually using the header toggle.