List Comparison Tool
Compare two text lists to find intersection, difference, and union using set theory. Ideal for data cleaning, deduplication, and analyzing dataset relationships.
About
Data integrity often relies on understanding the relationship between two distinct datasets. When merging database tables, validating migration scripts, or reconciling email marketing lists, analysts must identify which items exist in both sources and which are unique to one. Manual comparison in spreadsheets is prone to error and fails with large volumes. This tool utilizes strict set theory logic to process text data. It computes the mathematical intersection, difference, and union of two input arrays. The processing occurs entirely within the browser client. This ensures zero latency and data privacy. It handles tens of thousands of lines efficiently by utilizing hash map structures for O(n) algorithmic complexity. Precision is configurable. Users can control case sensitivity and whitespace handling to catch near-duplicates caused by formatting inconsistencies.
Formulas
The core logic relies on Set Theory operations. Let A be the set of unique lines in the first list and B be the set of unique lines in the second list. The computed subsets are defined as follows:
When the Ignore Case option is active, the mapping function f transforms all elements x such that f(x) = lower(x) before comparison.
Reference Data
| Operation | Set Notation | Definition | SQL Equivalent | Python Equivalent |
|---|---|---|---|---|
| Intersection | A ∩ B | Elements present in both List A and List B | INNER JOIN | a.intersection(b) |
| Difference (A Only) | A − B | Elements in List A but not in List B | LEFT JOIN ... WHERE b.id IS NULL | a.difference(b) |
| Difference (B Only) | B − A | Elements in List B but not in List A | RIGHT JOIN ... WHERE a.id IS NULL | b.difference(a) |
| Union | A ∪ B | All unique elements from both lists | FULL OUTER JOIN | a.union(b) |
| Symmetric Difference | A Δ B | Elements in either A or B, but not both | (Union) − (Intersection) | a.symmetric_difference(b) |