About

Comparing tab-separated value files manually risks missing critical data discrepancies. A single mismatched cell in a dataset of thousands can cascade into flawed analytics, broken database imports, or silent data corruption. This tool parses both TSV inputs, constructs an indexed lookup by a user-defined key column (or by full-row identity), and performs an O(n) diff. It reports rows that exist only in File A (removed), only in File B (added), or in both but with cell-level differences (modified). The comparison is deterministic and handles edge cases including empty cells, trailing delimiters, and mixed line endings (CRLF / LF).

The diff engine does not use fuzzy matching. Equality is strict: cell values are compared as trimmed strings. If your TSV contains numeric fields where 1.0 and 1 must be treated as equal, normalize your data beforehand. All processing runs client-side. No data leaves your browser.

Formulas

The diff algorithm operates in two phases. First, each row is indexed:

key(row) = row[k] (key column mode)

key(row) = hash(row[0] + "\t" + row[1] + …) (full row mode)

The hash function used is djb2:

h₀ = 5381

h_i+1 = ((h_i « 5) + h_i) + charCode(s[i])

Where k = key column index, row = array of cell strings, s = concatenated row string, h = running hash value. Comparison complexity is O(n + m) where n and m are row counts of File A and File B respectively.

Reference Data

Diff Status	Symbol	Meaning	Color Code
Added	+	Row exists only in File B	Green (#82B366)
Removed	−	Row exists only in File A	Coral (#E07A6B)
Modified	Δ	Row key matches but cell values differ	Amber (#F0C05A)
Unchanged	=	Row is identical in both files	None (default)
Common TSV Edge Cases
Empty cell	\t\t	Two consecutive tabs produce an empty string cell	-
Trailing tab	data\t	Produces an extra empty cell at row end	-
CRLF line ending	\r\n	Windows-style; tool normalizes to \n	-
LF line ending	\n	Unix/macOS-style	-
Quoted field	"val\tval"	Not standard TSV; tool treats tab as delimiter	-
Comparison Modes
Key Column	-	Uses a specific column as row identifier for matching	-
Full Row	-	Hashes entire row; identical rows match regardless of order	-
Row Index	-	Compares row-by-row by position (line 1 vs line 1)	-

Frequently Asked Questions

Key column mode uses a single column (e.g., an ID field) to pair rows between files. If row 5 in File A has key "USR-042" and row 12 in File B has the same key, they are compared cell-by-cell. Full-row mode hashes the entire row content and matches identical hashes. Use key column mode when rows may be reordered between files. Use full-row mode when there is no unique identifier.

If multiple rows share the same key value within a single file, only the last occurrence is indexed. Earlier duplicates are effectively invisible to the diff. If your data has non-unique keys, consider using Row Index mode or deduplicating beforehand.

Yes. Columns are compared positionally. If File A has columns [Name, Age] and File B has [Age, Name], every cell will appear modified. Ensure both files share the same column structure. The tool displays detected headers to help verify alignment.

Yes. If a row in File A has 5 columns and the matching row in File B has 7 columns, the extra columns in File B are flagged as modified (added cells). Missing columns are treated as empty strings for comparison purposes.

Processing is entirely in-browser. Practical limits depend on available RAM. Files under 50,000 rows (~10 MB) process near-instantly. Larger files trigger chunked processing with a progress indicator. Above 200,000 rows, expect several seconds of processing time.

All comparisons are string-based after trimming whitespace. The values "1.00" and "1" are considered different. If you need numeric equivalence, pre-process your TSV to normalize number formatting before comparing.