About

Comparing tabular data manually is error-prone once row counts exceed a few dozen. A single misaligned row shifts every subsequent comparison, producing false positives across the entire sheet. This tool implements a hash-accelerated row matching algorithm that identifies added, removed, and modified rows between two CSV files. It operates at the cell level: when two rows share a key or positional match, each cell is compared independently, so you see exactly which column changed and what the prior value was. The parser is RFC 4180-compliant, correctly handling quoted fields, embedded commas, and escaped double-quote characters ("").

Limitations: comparison is performed in-browser and loads both files into memory. Files beyond 50MB may cause slowdowns on low-memory devices. Floating-point values are compared as strings, so 1.0 and 1.00 are flagged as different unless you enable the trim-whitespace option. Row order matters in sequential mode. If your files share a unique identifier column, use key-column mode for order-independent matching.

Formulas

Row identity in key-column mode is determined by the value in the designated key column k. For each row r in file A, the tool builds a hash map:

Map_A[r[k]] → r

The same is done for file B. Rows present in both maps are compared cell-by-cell. The diff status for each row is computed as:

{

ADDED if key ∈ B ∧ key ∉ AREMOVED if key ∈ A ∧ key ∉ BMODIFIED if ∃ j : r_A[j] ≠ r_B[j]UNCHANGED otherwise

In sequential mode, rows are paired by index i. If file A has m rows and file B has n rows, rows 1 to min(m, n) are compared pair-wise. Remaining rows in the longer file are marked ADDED or REMOVED.

The FNV-1a hash used for fast row equality pre-check:

hash = 2166136261
for each byte b: hash = (hash ⊕ b) × 16777619

Where hash is a 32-bit unsigned integer, ⊕ is XOR. This provides fast rejection of non-matching rows before cell-level comparison.

Reference Data

Feature	Sequential Mode	Key-Column Mode
Row matching	Position-based (row i vs row i)	By unique key value
Detects added rows	Yes (extra rows at end)	Yes (key exists only in File B)
Detects removed rows	Yes (extra rows at end)	Yes (key exists only in File A)
Detects modified cells	Yes	Yes
Handles reordered rows	No - flags as modified	Yes
Requires unique column	No	Yes
Performance on 10k rows	< 1s	< 2s
Duplicate key handling	N/A	First occurrence wins
Delimiter support	Comma, semicolon, tab, pipe	Comma, semicolon, tab, pipe
Encoding	UTF-8	UTF-8
Max recommended file size	50MB	50MB
Header row required	Optional	Required
Export format	CSV with status column	CSV with status column
Whitespace trimming	Optional	Optional
Case-insensitive compare	Optional	Optional

Frequently Asked Questions

The tool indexes the first occurrence of each key value. Subsequent rows with the same key are ignored during matching, which means duplicates beyond the first will appear as unmatched (ADDED or REMOVED depending on which file they belong to). If your data contains duplicate keys, consider using sequential mode or deduplicating before comparison.

All cell values are compared as strings by default. The string "1.0" is not identical to "1.00". This is intentional because CSV files often contain formatted numbers where trailing zeros carry meaning (e.g., currency, measurement precision). Enable the "Trim whitespace" option to strip leading/trailing spaces, but numeric normalization is not applied to avoid data loss assumptions.

The tool uses the union of both column sets. If file A has 5 columns and file B has 7, the result table shows 7 columns. Missing cells in file A are treated as empty strings and compared against file B's values, so those cells will be flagged as modified if file B has non-empty content in those positions.

Yes. Each file's delimiter is detected independently using frequency analysis of the first 5 lines. You can also manually override the delimiter for each file in the settings panel. The tool supports comma, semicolon, tab, and pipe delimiters.

The parser is RFC 4180-compliant. Fields wrapped in double quotes can contain commas, newlines, and literal double-quote characters (escaped as two consecutive double quotes: ""). The parser correctly reconstructs multi-line fields without splitting them into separate rows.

The practical limit depends on available browser memory. Files up to 50 MB (roughly 500,000 rows with 10 columns) work reliably on modern devices with 4 GB+ RAM. The diff computation runs in a Web Worker to prevent UI freezing. For files exceeding this, consider splitting them into chunks or using a command-line tool like csvdiff.