Analyze CSV
Upload and analyze CSV files with automatic statistics, sorting, filtering, charts, and data export. RFC 4180 compliant parser.
About
CSV remains the dominant interchange format for tabular data, yet its apparent simplicity hides parsing traps: unescaped delimiters, inconsistent quoting, mixed encodings, and ambiguous column types. Feeding malformed CSV into a downstream pipeline without validation risks silent data corruption - truncated rows, shifted columns, or numeric fields miscast as strings. This tool implements a strict RFC 4180 state-machine parser that handles quoted fields, embedded newlines, and escaped double-quotes correctly. It auto-detects column types and computes descriptive statistics: xฬ (mean), ฯ (standard deviation), median, quartiles Q1 and Q3, skewness, and kurtosis for every numeric column. Categorical columns receive frequency counts and unique value tallies.
The analyzer supports files up to 50 MB with virtual scrolling for datasets exceeding 500 rows. Parsing of large files runs in a Web Worker to keep the UI responsive. Note: type detection uses heuristics - a column with > 80% numeric values is classified numeric. Edge cases like mixed-type columns or locale-specific decimal separators (comma vs. period) may require manual review. Pro tip: if your CSV uses semicolons (common in European Excel exports), the delimiter auto-detection handles it, but you can override manually.
Formulas
Descriptive statistics computed for each numeric column:
Arithmetic mean:
= 1n nโi=1 xiPopulation standard deviation:
ฯ = โnโi=1 (xi โ )2nSkewness (Fisher's):
ฮณ1 = 1n nโi=1 (xi โ )3ฯ3Excess kurtosis:
ฮบexcess = 1n nโi=1 (xi โ )4ฯ4 โ 3Where xi represents each observation, n is the count of non-empty values, is the arithmetic mean, and ฯ is the population standard deviation. The sample standard deviation s uses n โ 1 (Bessel's correction) in the denominator. Median is computed via sorted-array indexing: for odd n, the middle element; for even n, the average of the two central elements. Quartiles use inclusive interpolation (Method 1, same as Excel QUARTILE.INC).
Reference Data
| Statistic | Symbol | Description | Applicable To |
|---|---|---|---|
| Count | n | Total non-empty values in column | All types |
| Unique | nu | Distinct value count | All types |
| Null/Empty | nโ | Missing or empty cell count | All types |
| Mean | Arithmetic average | Numeric | |
| Median | xฬ | Middle value when sorted | Numeric |
| Mode | Mo | Most frequent value | All types |
| Standard Deviation | ฯ | Spread around the mean (population) | Numeric |
| Sample Std Dev | s | Spread using Bessel's correction (n โ 1) | Numeric |
| Minimum | min | Smallest value | Numeric |
| Maximum | max | Largest value | Numeric |
| Sum | ฮฃ | Total of all values | Numeric |
| Range | R | max โ min | Numeric |
| Q1 (25th percentile) | Q1 | Lower quartile boundary | Numeric |
| Q3 (75th percentile) | Q3 | Upper quartile boundary | Numeric |
| IQR | Q3 โ Q1 | Interquartile range, robust spread measure | Numeric |
| Skewness | ฮณ1 | Asymmetry of distribution. 0 = symmetric | Numeric |
| Kurtosis | ฮบ | Tail heaviness. 3 = normal (excess = 0) | Numeric |
| Coefficient of Variation | CV | ฯ รท expressed as % | Numeric |
| Top Frequency | fmax | Count of the most common value | Categorical |
| Avg. String Length | Mean character count of text values | String |