Average CSV Columns
Calculate the arithmetic mean of numeric columns in any CSV file. Supports auto-delimiter detection, missing value handling, and export.
About
Averaging columns in a CSV file sounds trivial until you encounter missing cells, mixed data types, inconsistent delimiters, or floating-point drift across 100,000+ rows. A naive sum-and-divide in a spreadsheet silently produces wrong results when blank cells are counted as zeros or when string values pollute a numeric column. This tool parses your CSV client-side, auto-detects the delimiter, identifies which columns are numeric, and computes the arithmetic mean xฬ using the Kahan summation algorithm to minimize IEEE 754 rounding error. It reports the count n of valid values per column, flags skipped non-numeric cells, and lets you choose whether missing entries are excluded or treated as 0.
The tool handles edge cases that break spreadsheets: quoted fields containing commas, UTF-8 BOM markers, trailing newlines, and inconsistent row lengths. Files never leave your browser. For datasets exceeding 10,000 rows, computation offloads to a Web Worker so the interface remains responsive. Results can be exported as a new CSV or copied to clipboard. Note: this tool computes the population mean. If you need sample variance or median, those require different algorithms not covered here.
Formulas
The arithmetic mean of a numeric column with n valid (non-missing, non-text) values is computed as:
Where xi is the i-th valid numeric cell in the column, and n is the count of valid cells (excluding blanks and text when the skip policy is active).
To avoid floating-point drift, summation uses the Kahan compensated algorithm:
for each xi:
y = xi โ c
t = sum + y
c = (t โ sum) โ y
sum = t
Where c is the compensation accumulator that captures low-order bits lost during addition. The final mean is sum รท n. This bounds the total rounding error to O(1) ULP regardless of n, compared to O(n) for naive summation.
Reference Data
| Delimiter | Symbol | Common Sources | Auto-Detected |
|---|---|---|---|
| Comma | , | Excel (US/UK), Google Sheets export | Yes |
| Semicolon | ; | Excel (EU locales), SAP exports | Yes |
| Tab | \t | Database dumps, TSV files | Yes |
| Pipe | | | Legacy mainframe exports, log files | Yes |
| Missing Value Policies | |||
| Skip | - | Cell excluded from count and sum | Default |
| Treat as Zero | 0 | Cell counted, value = 0 | Optional |
| Numeric Detection Rules | |||
| Integer | 42 | Matches regex ^-?\d+$ | Yes |
| Decimal (dot) | 3.14 | Matches ^-?\d+\.\d+$ | Yes |
| Decimal (comma) | 3,14 | When delimiter โ comma | Yes |
| Scientific | 1.5e-3 | Matches ^-?\d+\.?\d*[eE][+-]?\d+$ | Yes |
| Currency prefix | $100 | Stripped before parsing | Yes |
| Percentage | 45% | Suffix stripped, value kept as-is | Yes |
| Thousands separator | 1,000,000 | Commas stripped if delimiter โ comma | Yes |
| Precision & Limits | |||
| Max safe integer | 253 โ 1 | JavaScript Number limit | - |
| Float precision | ~15 digits | IEEE 754 double | - |
| Kahan error bound | O(1) ULP | vs O(n) naive | - |