User Rating 0.0
Total Usage 0 times
Original (Text A)
0 chars · 0 words · 0 lines
Modified (Text B)
0 chars · 0 words · 0 lines
Is this tool helpful?

Your feedback helps us improve.

About

Manually scanning two text versions for differences is error-prone. A single misplaced comma in a contract, a silently altered clause, or an overwritten variable name in source code can cascade into costly failures. This tool computes a precise line-level and word-level diff using the Longest Common Subsequence algorithm, reporting every insertion, deletion, and unchanged region. It calculates a similarity ratio S as a percentage, plus Levenshtein edit distance d, character counts, and word counts for both inputs. Results are deterministic and instantaneous for texts up to ~100k characters. The tool assumes plain-text input and compares Unicode codepoints; it does not normalize whitespace or ignore case unless you choose those options explicitly.

compare text text diff text comparison diff tool find differences

Formulas

The similarity ratio is derived from the Longest Common Subsequence length relative to both inputs:

S = 2 × LLCSLA + LB × 100%

Where LA and LB are line counts of Text A and Text B respectively, and LLCS is the length of their longest common subsequence of lines.

The Levenshtein distance between two strings of length m and n is computed via dynamic programming:

d(i, j) = min(d(i1, j) + 1, d(i, j1) + 1, d(i1, j1) + c)

Where c = 0 if characters match, 1 otherwise. The Jaccard similarity index for word sets is:

J = |A B||A B|

Reference Data

MetricSymbolDescriptionRange
Similarity RatioSPercentage of common subsequence length to average text length0 - 100%
Levenshtein DistancedMinimum single-character edits (insert, delete, substitute)0 - max(m,n)
Lines AddedL+Lines present only in Text B0
Lines RemovedLLines present only in Text A0
Lines UnchangedL=Identical lines in both texts0
Words (Text A)WAWhitespace-delimited token count in original0
Words (Text B)WBWhitespace-delimited token count in modified0
Characters (Text A)CATotal Unicode codepoints in original0
Characters (Text B)CBTotal Unicode codepoints in modified0
LCS LengthLLCSLength of longest common subsequence (lines)0
Edit OperationsETotal insert + delete operations in diff0
Jaccard Index (Words)JIntersection over union of unique word sets0 - 1

Frequently Asked Questions

The LCS-based diff does not track line movement as a distinct operation. A line removed from position 5 and appearing at position 12 will show as one deletion and one insertion. This is consistent with standard unified diff behavior used by Git and GNU diff. For detecting moves, you would need a secondary pass comparing removed and added lines, which this tool does not perform to keep output deterministic and unambiguous.
The LCS algorithm has O(m×n) space and time complexity in its basic form. For texts exceeding roughly 10,000 lines each, computation may take several seconds. The tool uses an optimized approach that trims common prefixes and suffixes before running the core diff, which dramatically reduces the problem size for typical revision comparisons where most content is unchanged.
By default, yes. Every character including spaces, tabs, and line endings contributes to the comparison. Enable the "Trim whitespace" option to strip leading/trailing whitespace per line before comparison. This prevents indentation changes from inflating the edit count. The similarity ratio S will increase when whitespace-only differences are excluded.
When two lines are paired as a modification (one removed, one added), the tool runs a secondary character-level LCS on those two lines. Characters present in only the old version are highlighted as deletions; characters present only in the new version are highlighted as insertions. This provides granular visibility into what exactly changed within a single line.
No. The comparison operates on raw text codepoints with no language grammar awareness. It treats Python, JSON, or prose identically. Syntactic equivalences (e.g., reordered JSON keys producing identical objects) will appear as differences. For semantic code comparison, use AST-based tooling. This tool excels at literal textual diff.
Levenshtein distance counts the minimum single-character edits (insertions, deletions, substitutions) to transform one string into another. It operates at the character level on the full text. The LCS-based similarity works at the line level, measuring how many lines are shared in sequence. A high Levenshtein distance with high line similarity means many small in-line edits across otherwise matching lines.