Count CSV Columns
Count the number of columns in a CSV file instantly. Supports custom delimiters, auto-detection, quoted fields, and per-row column analysis.
About
Misaligned columns in CSV data cause silent failures in ETL pipelines, database imports, and analytics dashboards. A single unescaped delimiter inside a field shifts every subsequent column, producing corrupt records that pass validation but yield wrong results. This tool parses your CSV using a strict RFC 4180 state machine that correctly handles quoted fields, escaped double-quotes (""), and embedded newlines. It reports the column count per row, flags inconsistencies where row i has ni ≠ n1, and auto-detects the delimiter from comma, semicolon, tab, and pipe characters.
The tool approximates header presence by checking whether the first row contains exclusively non-numeric strings while subsequent rows contain mixed or numeric data. Limitation: auto-detection fails on files where every field is text or where multiple candidate delimiters appear with equal frequency. In such cases, select the delimiter manually. Files up to 50 MB are supported client-side with no server upload.
Formulas
Column counting follows a finite-state parser. For each row, the parser transitions between states based on the current character and the active state. The column count for row i is:
where delimitersi counts only unquoted delimiter characters in row i. The delimiter auto-detection score for candidate d is computed as:
where consistency measures what fraction of sampled rows produce the same column count, and frequency is the mean count of d per row. The candidate with the highest score is selected. Inconsistency is flagged when:
where N is total row count, mode returns the most frequent column count, colsi is the column count for row i, and rows deviating from the mode are reported as inconsistent.
Reference Data
| Delimiter | Name | Common Use | Unicode | RFC Standard | Risk Factor |
|---|---|---|---|---|---|
| , | Comma | International CSV default | U+002C | RFC 4180 | Breaks on European decimals (3,14) |
| ; | Semicolon | European CSV (Excel EU locale) | U+003B | Non-standard | Rare in field data |
| \t | Tab | TSV files, database exports | U+0009 | IANA TSV | Invisible character, hard to debug |
| | | Pipe | Legacy systems, HL7 medical data | U+007C | Non-standard | Conflicts with shell piping |
| \x1F | Unit Separator | ASCII control character | U+001F | Non-standard | Not human-readable |
| Common Column Count Expectations | |||||
| Standard Address File | 5 - 8 columns | Name, Street, City, State, Zip, Country | |||
| Bank Transaction Export | 6 - 12 columns | Date, Description, Debit, Credit, Balance, Reference | |||
| Web Analytics (GA Export) | 10 - 30 columns | Session, Source, Medium, Page, Bounce Rate, etc. | |||
| eCommerce Product Feed | 15 - 50 columns | SKU, Title, Description, Price, Images, Variants | |||
| Scientific Dataset (Tidy) | 3 - 20 columns | Observation, Variable, Value per tidy data principles | |||
| US Census PUMS | 200+ columns | Microdata with coded variables | |||
| Apache Log (CSV-converted) | 7 - 9 columns | IP, Timestamp, Method, URL, Status, Size, Referrer | |||
| CRM Contact Export | 20 - 40 columns | Name, Email, Phone, Company, Tags, Custom Fields | |||
| IoT Sensor Readings | 4 - 15 columns | Timestamp, Sensor ID, Value, Unit, Status | |||
| Genomics VCF (tab-delimited) | 8+ fixed + n samples | CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO | |||