User Rating 0.0
Total Usage 0 times

Drop your CSV file here

or click to browse · .csv .tsv .txt

Is this tool helpful?

Your feedback helps us improve.

About

Counting rows in a CSV file is not equivalent to counting newline characters. A conformant CSV per RFC 4180 permits quoted fields containing embedded newlines, meaning a single logical row can span multiple physical lines. Naive line-counting tools (wc -l, text editor line numbers) will overcount in these cases, producing incorrect data inventories. This matters when validating ETL pipelines, checking data exports against source record counts, or estimating processing time for batch operations where the row count drives resource allocation.

This tool implements a state-machine parser that tracks whether the cursor is inside a quoted field, correctly distinguishing data newlines from row-terminating newlines. It auto-detects the delimiter (comma, semicolon, tab, pipe) by frequency analysis of the first 5 lines, strips BOM prefixes, and excludes trailing empty lines. For files exceeding 1 MB, parsing is offloaded to a Web Worker to keep the interface responsive. Limitation: this tool assumes UTF-8 or ASCII encoding. Files in UTF-16 or legacy encodings (Shift-JIS, Windows-1252) may produce incorrect counts if they contain multi-byte sequences that alias delimiter or quote characters.

csv row counter csv rows csv line count csv parser data analysis file tool

Formulas

The row counting algorithm uses a finite state machine with two states: UNQUOTED and QUOTED. The transition rules determine whether a newline character increments the row counter R.

R = 0, state = UNQUOTED

For each character c in input:

{
state QUOTED if c = " state = UNQUOTEDstate UNQUOTED if c = " state = QUOTED next "R = R + 1 if c {LF, CRLF} state = UNQUOTEDskip if c {LF, CRLF} state = QUOTED

After full traversal, if the file does not end with a newline and the last row contains data, R is incremented by 1. The total data row count is then R 1 if the header toggle is enabled, otherwise R.

Delimiter auto-detection scores each candidate delimiter d by computing the standard deviation σ of occurrence counts across the first 5 lines. The delimiter with the lowest non-zero σ and highest mean count wins, as consistent frequency implies structural use rather than incidental appearance in data.

Reference Data

DelimiterCommon NameSymbolFile ExtensionAuto-DetectedNotes
CommaCSV,.csvYesRFC 4180 standard
SemicolonCSV (European);.csvYesCommon when locale uses comma as decimal separator
TabTSV\t.tsv, .tabYesRarely appears inside field values
PipePSV|.psv, .txtYesUsed in medical (HL7) and financial data
CaretCaret-SV^.txtNoRare; use manual override
TildeTilde-SV~.txtNoLegacy mainframe exports
RFC 4180 Key Rules
Rule 1Each record is on a separate line, delimited by a line break (CRLF)
Rule 2Last record may or may not have an ending line break
Rule 3Optional header line with same format as records
Rule 4Fields may be enclosed in double quotes
Rule 5Fields containing line breaks, double quotes, or commas must be quoted
Rule 6Double quote inside a quoted field is escaped as ""
Common Row Count Discrepancies
CauseEffect on naive countThis tool
Quoted newlinesOvercountsCorrect
Trailing empty line+1 phantom rowExcluded
BOM prefixFirst field corruptedStripped
Mixed line endings (CR/LF/CRLF)Undercounts or overcountsNormalized

Frequently Asked Questions

Your CSV likely contains quoted fields with embedded newline characters. Per RFC 4180, a field wrapped in double quotes may contain line breaks as literal data. A text editor counts physical lines (every LF or CRLF), while this tool counts logical rows by tracking whether the parser is inside a quoted field. The difference equals the number of embedded newlines within your data.
The tool samples the first 5 lines and counts occurrences of each candidate delimiter (comma, semicolon, tab, pipe). It selects the delimiter with the most consistent count across lines (lowest standard deviation) and a non-zero mean. Override manually when your file has fewer than 3 rows (insufficient sample), when multiple delimiters appear with equal frequency, or when using an uncommon delimiter like caret or tilde.
It affects the reported data row count. When enabled, the tool subtracts 1 from the total row count, reporting the header separately. The total parsed rows (including header) is always displayed alongside. This distinction matters when comparing against database record counts, which exclude headers.
The parser normalizes all line ending styles before counting. It treats standalone CR (old Mac), standalone LF (Unix/Mac), and CRLF (Windows) identically as row terminators. A CRLF sequence is consumed as a single delimiter, not two. This prevents double-counting that affects naive parsers on files transferred between operating systems.
There is no hard limit; the practical ceiling depends on your browser's available memory. Files under 1 MB are parsed on the main thread. Files over 1 MB are offloaded to a Web Worker to prevent the UI from freezing. For files exceeding roughly 500 MB, you may experience memory pressure. In such cases, consider splitting the file or using a command-line tool like awk.
Per RFC 4180 Rule 6, a double quote inside a quoted field is represented as two consecutive double quotes (""). The parser's state machine recognizes this pattern: when in QUOTED state and encountering a double quote followed by another double quote, it treats the pair as a literal quote character and remains in QUOTED state. Only a double quote followed by a delimiter, newline, or end-of-file transitions back to UNQUOTED state.