About

Averaging columns in a CSV file sounds trivial until you encounter missing cells, mixed data types, inconsistent delimiters, or floating-point drift across 100,000+ rows. A naive sum-and-divide in a spreadsheet silently produces wrong results when blank cells are counted as zeros or when string values pollute a numeric column. This tool parses your CSV client-side, auto-detects the delimiter, identifies which columns are numeric, and computes the arithmetic mean x̄ using the Kahan summation algorithm to minimize IEEE 754 rounding error. It reports the count n of valid values per column, flags skipped non-numeric cells, and lets you choose whether missing entries are excluded or treated as 0.

The tool handles edge cases that break spreadsheets: quoted fields containing commas, UTF-8 BOM markers, trailing newlines, and inconsistent row lengths. Files never leave your browser. For datasets exceeding 10,000 rows, computation offloads to a Web Worker so the interface remains responsive. Results can be exported as a new CSV or copied to clipboard. Note: this tool computes the population mean. If you need sample variance or median, those require different algorithms not covered here.

Formulas

The arithmetic mean of a numeric column with n valid (non-missing, non-text) values is computed as:

x = 1n n∑i=1 x_i

Where x_i is the i-th valid numeric cell in the column, and n is the count of valid cells (excluding blanks and text when the skip policy is active).

To avoid floating-point drift, summation uses the Kahan compensated algorithm:

sum ← 0, c ← 0
for each x_i:
y = x_i − c
t = sum + y
c = (t − sum) − y
sum = t

Where c is the compensation accumulator that captures low-order bits lost during addition. The final mean is sum ÷ n. This bounds the total rounding error to O(1) ULP regardless of n, compared to O(n) for naive summation.

Reference Data

Delimiter	Symbol	Common Sources	Auto-Detected
Comma	,	Excel (US/UK), Google Sheets export	Yes
Semicolon	;	Excel (EU locales), SAP exports	Yes
Tab	\t	Database dumps, TSV files	Yes
Pipe	\|	Legacy mainframe exports, log files	Yes
Missing Value Policies
Skip	-	Cell excluded from count and sum	Default
Treat as Zero	0	Cell counted, value = 0	Optional
Numeric Detection Rules
Integer	42	Matches regex ^-?\d+$	Yes
Decimal (dot)	3.14	Matches ^-?\d+\.\d+$	Yes
Decimal (comma)	3,14	When delimiter ≠ comma	Yes
Scientific	1.5e-3	Matches ^-?\d+\.?\d*[eE][+-]?\d+$	Yes
Currency prefix	$100	Stripped before parsing	Yes
Percentage	45%	Suffix stripped, value kept as-is	Yes
Thousands separator	1,000,000	Commas stripped if delimiter ≠ comma	Yes
Precision & Limits
Max safe integer	2⁵³ − 1	JavaScript Number limit	-
Float precision	~15 digits	IEEE 754 double	-
Kahan error bound	O(1) ULP	vs O(n) naive	-

Frequently Asked Questions

Each cell is individually tested against numeric patterns (integers, decimals, scientific notation, currency-prefixed, and percentage-suffixed values). Cells that fail all patterns are classified as text and excluded from the average. The results panel reports the count of skipped non-numeric cells per column so you can verify data quality.

The missing-value policy controls this. In Skip mode (default), blank cells are excluded from both the sum and the count n, giving a true mean of present values. In Treat as Zero mode, blanks contribute 0 to the sum and increment n by 1. The choice matters significantly: a column with values 10, 20, and a blank yields a mean of 15 in Skip mode but 10 in Zero mode.

IEEE 754 floating-point addition loses low-order bits when adding a small number to a large running sum. Over 100,000+ rows, naive summation can drift by several units in the last place (ULP). Kahan summation maintains a separate compensation variable c that recaptures these lost bits, keeping total error bounded to O(1) ULP regardless of dataset size.

The tool scans the first 5 lines and counts occurrences of comma, semicolon, tab, and pipe. The character with the most consistent count across lines wins. This can fail if your data contains embedded delimiters inside unquoted fields or if the file has fewer than 2 lines. In such cases, override the delimiter manually using the dropdown.

Yes. The parser implements RFC 4180 quoting rules: fields wrapped in double quotes can contain commas, newlines, and escaped double quotes (represented as two consecutive double quotes). This prevents false column splits inside quoted strings.

There is no hard limit, but practical performance depends on browser memory. Files under 50 MB typically process in seconds. For files exceeding 10,000 rows, computation moves to a Web Worker to keep the UI responsive. Files over 200 MB may cause browser memory pressure on devices with limited RAM.

Yes, but only when the file delimiter is not a comma. If the delimiter is semicolon, tab, or pipe, the parser recognizes comma as a decimal separator within numeric fields. If the delimiter is comma, there is no way to distinguish a decimal comma from a field separator, so dot-decimal notation must be used.