User Rating 0.0
Total Usage 0 times
Levenshtein Distance 3
Similarity 57.1%
Operations 3
Complexity O(42)
Dynamic Programming Matrix
Match
Substitute
Insert
Delete
Optimal Path
Step-by-Step Transformation
Ranked Matches
Candidate Distance Similarity
Is this tool helpful?

Your feedback helps us improve.

About

The Levenshtein Distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.

This metric is critical in fields ranging from computer science to biology. In software, it powers spell checkers, optical character recognition (OCR) correction systems, and fuzzy search logic. In bioinformatics, similar algorithms (like Needleman-Wunsch) are used to align DNA and protein sequences to identify evolutionary relationships or mutations.

edit distance string metric fuzzy match algorithm visualization text diff

Formulas

The Levenshtein distance between two strings a (of length i) and b (of length j) is given by levij:

{
maxij if minij = 0min
{
levi1j + 1levij1 + 1levi1j1 + 1(ai bj)
otherwise

Where 1(ai bj) is the indicator function equal to 0 when the characters are the same and 1 otherwise.

Reference Data

MetricDescriptionOperationsUse Case
LevenshteinStandard edit distance.Insert, Delete, SubstituteSpell checking, NLP, DNA alignment.
Damerau-LevenshteinExtension of Levenshtein.Ins, Del, Sub, TranspositionTypo correction (swapped adjacent keys).
HammingOnly for strings of equal length.Substitution onlyError correcting codes, telecommunications.
Jaro-WinklerSimilarity score (0-1).Matching characters, transpositionsRecord linkage, duplicate detection (names).
LCSLongest Common Subsequence.Insert, DeleteDiff utilities (Git), file comparison.

Frequently Asked Questions

Yes. In a case-sensitive comparison, "A" and "a" are treated as different characters, requiring a substitution operation (adding 1 to the distance). In case-insensitive mode, they are treated as a match (cost 0). Our tool provides a toggle to switch between these modes.
Context is king. For spell checking, a similarity > 80% usually suggests a typo. In duplicate detection for databases, you might require > 95%. The percentage is calculated as: (1 - Distance / MaxLength) * 100.
The algorithm fills a table of size (m+1) x (n+1), where m and n are the string lengths. Since calculating each cell requires looking at three previous cells (constant time), the total time complexity is proportional to the number of cells, i.e., O(m*n).
Once the matrix is filled, we start at the bottom-right cell (final distance) and move backwards to the top-left (0,0). At each step, we move to the neighbor with the lowest cost, which reveals whether the optimal operation at that stage was an Insertion, Deletion, or Substitution.