Compare Text
Compare two texts side-by-side with line-by-line and word-level diff highlighting. Find additions, deletions, and similarities instantly.
About
Manually scanning two text versions for differences is error-prone. A single misplaced comma in a contract, a silently altered clause, or an overwritten variable name in source code can cascade into costly failures. This tool computes a precise line-level and word-level diff using the Longest Common Subsequence algorithm, reporting every insertion, deletion, and unchanged region. It calculates a similarity ratio S as a percentage, plus Levenshtein edit distance d, character counts, and word counts for both inputs. Results are deterministic and instantaneous for texts up to ~100k characters. The tool assumes plain-text input and compares Unicode codepoints; it does not normalize whitespace or ignore case unless you choose those options explicitly.
Formulas
The similarity ratio is derived from the Longest Common Subsequence length relative to both inputs:
Where LA and LB are line counts of Text A and Text B respectively, and LLCS is the length of their longest common subsequence of lines.
The Levenshtein distance between two strings of length m and n is computed via dynamic programming:
Where c = 0 if characters match, 1 otherwise. The Jaccard similarity index for word sets is:
Reference Data
| Metric | Symbol | Description | Range |
|---|---|---|---|
| Similarity Ratio | S | Percentage of common subsequence length to average text length | 0 - 100% |
| Levenshtein Distance | d | Minimum single-character edits (insert, delete, substitute) | 0 - max(m,n) |
| Lines Added | L+ | Lines present only in Text B | ≥ 0 |
| Lines Removed | L− | Lines present only in Text A | ≥ 0 |
| Lines Unchanged | L= | Identical lines in both texts | ≥ 0 |
| Words (Text A) | WA | Whitespace-delimited token count in original | ≥ 0 |
| Words (Text B) | WB | Whitespace-delimited token count in modified | ≥ 0 |
| Characters (Text A) | CA | Total Unicode codepoints in original | ≥ 0 |
| Characters (Text B) | CB | Total Unicode codepoints in modified | ≥ 0 |
| LCS Length | LLCS | Length of longest common subsequence (lines) | ≥ 0 |
| Edit Operations | E | Total insert + delete operations in diff | ≥ 0 |
| Jaccard Index (Words) | J | Intersection over union of unique word sets | 0 - 1 |