User Rating 0.0
Total Usage 0 times
List A0
List B0
Intersection (A ∩ B)
A Only (A - B)
B Only (B - A)
Union (A ∪ B)
Is this tool helpful?

Your feedback helps us improve.

About

Data integrity often relies on understanding the relationship between two distinct datasets. When merging database tables, validating migration scripts, or reconciling email marketing lists, analysts must identify which items exist in both sources and which are unique to one. Manual comparison in spreadsheets is prone to error and fails with large volumes. This tool utilizes strict set theory logic to process text data. It computes the mathematical intersection, difference, and union of two input arrays. The processing occurs entirely within the browser client. This ensures zero latency and data privacy. It handles tens of thousands of lines efficiently by utilizing hash map structures for O(n) algorithmic complexity. Precision is configurable. Users can control case sensitivity and whitespace handling to catch near-duplicates caused by formatting inconsistencies.

set operations data cleaning diff tool list intersection deduplication

Formulas

The core logic relies on Set Theory operations. Let A be the set of unique lines in the first list and B be the set of unique lines in the second list. The computed subsets are defined as follows:

{
Intersection: A B = {x | x A x B}Difference (A): A B = {x | x A x B}Union: A B = {x | x A x B}

When the Ignore Case option is active, the mapping function f transforms all elements x such that f(x) = lower(x) before comparison.

Reference Data

OperationSet NotationDefinitionSQL EquivalentPython Equivalent
IntersectionA BElements present in both List A and List BINNER JOINa.intersection(b)
Difference (A Only)A BElements in List A but not in List BLEFT JOIN ... WHERE b.id IS NULLa.difference(b)
Difference (B Only)B AElements in List B but not in List ARIGHT JOIN ... WHERE a.id IS NULLb.difference(a)
UnionA BAll unique elements from both listsFULL OUTER JOINa.union(b)
Symmetric DifferenceA Δ BElements in either A or B, but not both(Union) (Intersection)a.symmetric_difference(b)

Frequently Asked Questions

This tool strictly applies Set Theory principles. A mathematical Set by definition cannot contain duplicate elements. Therefore, if List A contains the same email address three times, it is treated as a single unique element during calculation. The output counts reflect unique items only.
Computers treat "[email protected]" and "[email protected]" as distinct binary strings. Disabling case sensitivity normalizes all text to lowercase before hashing. This is critical for reconciling human-entered data where capitalization is inconsistent but the semantic value is identical.
No. All logic is executed via JavaScript within your local browser instance. The data never leaves your device. This is essential for compliance when handling sensitive lists like customer emails or internal database keys.
Performance depends on the available RAM of your device. Modern browsers can typically handle sets containing several hundred thousand items without significant delay. The algorithm uses Hash Maps yielding O(n) complexity, meaning processing time grows linearly with list size rather than exponentially.
The tool expects raw text separated by newlines. If you paste CSV data, it will treat the entire line (including commas) as the comparison string. For accurate column-specific comparison, extract the desired column from your CSV before pasting it into the input fields.