User Rating 0.0
Total Usage 0 times
A
Drop CSV file or click to browse
B
Drop CSV file or click to browse
Is this tool helpful?

Your feedback helps us improve.

About

Comparing tabular data manually is error-prone once row counts exceed a few dozen. A single misaligned row shifts every subsequent comparison, producing false positives across the entire sheet. This tool implements a hash-accelerated row matching algorithm that identifies added, removed, and modified rows between two CSV files. It operates at the cell level: when two rows share a key or positional match, each cell is compared independently, so you see exactly which column changed and what the prior value was. The parser is RFC 4180-compliant, correctly handling quoted fields, embedded commas, and escaped double-quote characters ("").

Limitations: comparison is performed in-browser and loads both files into memory. Files beyond 50MB may cause slowdowns on low-memory devices. Floating-point values are compared as strings, so 1.0 and 1.00 are flagged as different unless you enable the trim-whitespace option. Row order matters in sequential mode. If your files share a unique identifier column, use key-column mode for order-independent matching.

csv compare csv diff file comparison data diff csv merge spreadsheet compare

Formulas

Row identity in key-column mode is determined by the value in the designated key column k. For each row r in file A, the tool builds a hash map:

MapA[r[k]] r

The same is done for file B. Rows present in both maps are compared cell-by-cell. The diff status for each row is computed as:

{
ADDED if key B key AREMOVED if key A key BMODIFIED if j : rA[j] rB[j]UNCHANGED otherwise

In sequential mode, rows are paired by index i. If file A has m rows and file B has n rows, rows 1 to min(m, n) are compared pair-wise. Remaining rows in the longer file are marked ADDED or REMOVED.

The FNV-1a hash used for fast row equality pre-check:

hash = 2166136261
for each byte b: hash = (hash b) × 16777619

Where hash is a 32-bit unsigned integer, is XOR. This provides fast rejection of non-matching rows before cell-level comparison.

Reference Data

FeatureSequential ModeKey-Column Mode
Row matchingPosition-based (row i vs row i)By unique key value
Detects added rowsYes (extra rows at end)Yes (key exists only in File B)
Detects removed rowsYes (extra rows at end)Yes (key exists only in File A)
Detects modified cellsYesYes
Handles reordered rowsNo - flags as modifiedYes
Requires unique columnNoYes
Performance on 10k rows< 1s< 2s
Duplicate key handlingN/AFirst occurrence wins
Delimiter supportComma, semicolon, tab, pipeComma, semicolon, tab, pipe
EncodingUTF-8UTF-8
Max recommended file size50MB50MB
Header row requiredOptionalRequired
Export formatCSV with status columnCSV with status column
Whitespace trimmingOptionalOptional
Case-insensitive compareOptionalOptional

Frequently Asked Questions

The tool indexes the first occurrence of each key value. Subsequent rows with the same key are ignored during matching, which means duplicates beyond the first will appear as unmatched (ADDED or REMOVED depending on which file they belong to). If your data contains duplicate keys, consider using sequential mode or deduplicating before comparison.
All cell values are compared as strings by default. The string "1.0" is not identical to "1.00". This is intentional because CSV files often contain formatted numbers where trailing zeros carry meaning (e.g., currency, measurement precision). Enable the "Trim whitespace" option to strip leading/trailing spaces, but numeric normalization is not applied to avoid data loss assumptions.
The tool uses the union of both column sets. If file A has 5 columns and file B has 7, the result table shows 7 columns. Missing cells in file A are treated as empty strings and compared against file B's values, so those cells will be flagged as modified if file B has non-empty content in those positions.
Yes. Each file's delimiter is detected independently using frequency analysis of the first 5 lines. You can also manually override the delimiter for each file in the settings panel. The tool supports comma, semicolon, tab, and pipe delimiters.
The parser is RFC 4180-compliant. Fields wrapped in double quotes can contain commas, newlines, and literal double-quote characters (escaped as two consecutive double quotes: ""). The parser correctly reconstructs multi-line fields without splitting them into separate rows.
The practical limit depends on available browser memory. Files up to 50 MB (roughly 500,000 rows with 10 columns) work reliably on modern devices with 4 GB+ RAM. The diff computation runs in a Web Worker to prevent UI freezing. For files exceeding this, consider splitting them into chunks or using a command-line tool like csvdiff.