About

Averaging rows in a CSV file sounds trivial until you encounter quoted fields containing commas, mixed delimiters, thousands of empty cells, or columns where numeric and text data coexist. A naive approach - splitting on commas and dividing - fails on real-world exports from Excel, Google Sheets, or database dumps. This tool implements an RFC 4180-compliant parser that correctly handles quoted strings, escaped double-quotes, and auto-detects the delimiter (d ∈ {, ; \t |}). For each numeric column, it computes the arithmetic mean 1n n∑i=1 x_i, excluding non-numeric cells from the count. Getting this wrong means skewed reports, broken dashboards, and flawed business decisions.

The tool processes files up to several hundred thousand rows client-side using a Web Worker, so no data leaves your browser. It reports per-column mean, median, sum, min, max, and valid count. Limitation: floating-point arithmetic is IEEE 754 double-precision, so results beyond 15 significant digits will exhibit rounding. For currency data, verify that your source does not mix formats (e.g., 1,234.56 vs 1.234,56).

Formulas

The primary computation is the arithmetic mean per column. Given a column with n valid numeric values:

x = 1n n∑i=1 x_i

Where x_i is the i-th valid numeric cell in the column and n is the count of cells that passed numeric validation. Non-numeric or empty cells are excluded from both the sum and the count (unless "Treat empty as zero" is enabled, in which case empty cells contribute 0 to the sum and increment n).

σ = √n∑i=1 (x_i − x)²n

Median is computed by sorting all n valid values and selecting the middle element. For even n, it is the mean of the two central values.

Delimiter auto-detection counts occurrences of each candidate delimiter (, ; \t |) in the first 5 lines. The delimiter with the most consistent count across lines wins.

Reference Data

Statistic	Symbol	Formula	Use Case	Sensitivity to Outliers
Arithmetic Mean	x	1n∑x_i	General average, revenue per row	High
Median	x̃	Middle value when sorted	Salary analysis, skewed distributions	Low
Sum	Σ	∑x_i	Total revenue, total units	High
Minimum	min	min(x₁, …, x_n)	Lowest price, floor detection	Extreme
Maximum	max	max(x₁, …, x_n)	Peak value, ceiling detection	Extreme
Count (Valid)	n	Non-empty numeric cells	Data completeness audit	None
Standard Deviation	σ	√∑(x_i − x)²n	Spread of values, quality control	High
Delimiter: Comma	,	ASCII 44	Default CSV (RFC 4180)	-
Delimiter: Semicolon	;	ASCII 59	European Excel exports	-
Delimiter: Tab	\t	ASCII 9	TSV files, database dumps	-
Delimiter: Pipe	\|	ASCII 124	Legacy systems, log files	-
IEEE 754 Double	-	64-bit float	All JS arithmetic	15 - 17 significant digits
RFC 4180	-	CSV standard	Quoted fields, CRLF line breaks	-
Empty Cell Handling: Skip	-	Exclude from n	Sparse data, optional fields	-
Empty Cell Handling: Zero	-	Treat as 0	Dense data, required fields	Lowers mean

Frequently Asked Questions

The parser implements RFC 4180 rules. Any field wrapped in double quotes is treated as a single value, regardless of commas, semicolons, or line breaks inside the quotes. Escaped quotes (two consecutive double-quote characters "") are collapsed to a single quote. This means exports from Excel or Google Sheets with addresses like "123 Main St, Suite 4" are parsed correctly as one field.

By default, non-numeric cells (text, booleans, dates, empty strings) are skipped. They do not contribute to the sum or the count. This means the denominator n reflects only cells that contained valid numbers. You can toggle "Treat empty as zero" to count blank cells as 0, which lowers the mean for sparse columns.

The tool examines the first 5 rows of your data and counts occurrences of four candidate delimiters: comma, semicolon, tab, and pipe. It selects the one whose count per row has the lowest variance (most consistent). If auto-detection fails (e.g., single-column data), it defaults to comma. You can always manually select a delimiter from the dropdown to override.

All processing is 100% client-side. No data is uploaded. For files exceeding 10,000 rows, the tool offloads parsing and computation to a Web Worker to keep the UI responsive. Practically, files up to several hundred thousand rows work well in modern browsers. Memory is the constraint: a 50 MB CSV file requires roughly 100-200 MB of browser heap during parsing.

JavaScript uses IEEE 754 double-precision floating-point, which provides approximately 15-17 significant decimal digits. If your CSV contains values like 9,999,999,999,999,999 the least significant digits may be rounded. Excel uses the same IEEE 754 standard but occasionally applies different rounding heuristics for display. For financial data requiring exact decimal arithmetic, verify results against a decimal-precision tool.

When "First row is header" is checked (default), the first row is used for column labels and excluded from all calculations. If the first row contains numbers and you forget to uncheck this option, those values will be lost from the statistics. The tool shows a preview of the first 5 parsed rows so you can verify the header detection is correct before computing.