Compare Two CSV Files
Compare two CSV files side-by-side to find added, removed, and modified rows. Export diff results as CSV with cell-level highlighting.
About
Comparing tabular data manually is error-prone once row counts exceed a few dozen. A single misaligned row shifts every subsequent comparison, producing false positives across the entire sheet. This tool implements a hash-accelerated row matching algorithm that identifies added, removed, and modified rows between two CSV files. It operates at the cell level: when two rows share a key or positional match, each cell is compared independently, so you see exactly which column changed and what the prior value was. The parser is RFC 4180-compliant, correctly handling quoted fields, embedded commas, and escaped double-quote characters ("").
Limitations: comparison is performed in-browser and loads both files into memory. Files beyond 50MB may cause slowdowns on low-memory devices. Floating-point values are compared as strings, so 1.0 and 1.00 are flagged as different unless you enable the trim-whitespace option. Row order matters in sequential mode. If your files share a unique identifier column, use key-column mode for order-independent matching.
Formulas
Row identity in key-column mode is determined by the value in the designated key column k. For each row r in file A, the tool builds a hash map:
The same is done for file B. Rows present in both maps are compared cell-by-cell. The diff status for each row is computed as:
In sequential mode, rows are paired by index i. If file A has m rows and file B has n rows, rows 1 to min(m, n) are compared pair-wise. Remaining rows in the longer file are marked ADDED or REMOVED.
The FNV-1a hash used for fast row equality pre-check:
for each byte b: hash = (hash ⊕ b) × 16777619
Where hash is a 32-bit unsigned integer, ⊕ is XOR. This provides fast rejection of non-matching rows before cell-level comparison.
Reference Data
| Feature | Sequential Mode | Key-Column Mode |
|---|---|---|
| Row matching | Position-based (row i vs row i) | By unique key value |
| Detects added rows | Yes (extra rows at end) | Yes (key exists only in File B) |
| Detects removed rows | Yes (extra rows at end) | Yes (key exists only in File A) |
| Detects modified cells | Yes | Yes |
| Handles reordered rows | No - flags as modified | Yes |
| Requires unique column | No | Yes |
| Performance on 10k rows | < 1s | < 2s |
| Duplicate key handling | N/A | First occurrence wins |
| Delimiter support | Comma, semicolon, tab, pipe | Comma, semicolon, tab, pipe |
| Encoding | UTF-8 | UTF-8 |
| Max recommended file size | 50MB | 50MB |
| Header row required | Optional | Required |
| Export format | CSV with status column | CSV with status column |
| Whitespace trimming | Optional | Optional |
| Case-insensitive compare | Optional | Optional |