The metric is named after Vladimir Levenshtein, who considered this distance in 1965. It is often used in applications that need to determine how similar, or different, two strings are, such as spell checkers.
For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
It can be considered a generalization of the Hamming distance, which is used for strings of the same length and only considers substitution edits.
int LevenshteinDistance(char s[1..m], char t[1..n])
// d is a table with m+1 rows and n+1 columns
declare int d[0..m, 0..n]
for i from 0 to m
d[i, 0] := i
for j from 0 to n
d[0, j] := j
for i from 1 to m
for j from 1 to n
if s[i] = t[j] then cost := 0
else cost := 1
d[i, j] := minimum(
d[i-1, j] + 1, // deletion
d[i, j-1] + 1, // insertion
d[i-1, j-1] + cost // substitution
return d[m, n]
Two examples of the resulting matrix (the minimum steps to be taken are highlighted):
The invariant maintained throughout the algorithm is that we can transform the initial segment
t[1..j] using a minimum of
d[i,j] operations. At the end, the bottom-right element of the array contains the answer.
This algorithm is essentially part of a solution to the Longest common subsequence problem (LCS), in the particular case of 2 input lists.
t[1..j]using a minimum of
d[i,j]operations. This invariant holds since:
s[1..i]can be transformed into the empty string
t[1..0]by simply dropping all
icharacters. Similarly, we can transform
t[1..j]by simply adding all
koperations, then we can simply add
t[j]afterwards to get
koperations, then we can do the same operations on
s[1..i]and then remove the original
s[i]at the end in
koperations, we can do the same to
s[1..i]and then do a substitution of
t[j]for the original
s[i]at the end if necessary, requiring
t[1..m]is of course the number required to transform all of
sinto all of
t, and so
d[n,m]holds our result.
This proof fails to validate that the number placed in
d[i,j] is in fact minimal; this is more difficult to show, and involves an argument by contradiction in which we assume
d[i,j] is smaller than the minimum of the three, and use this to show one of the three is not minimal.
d[i,0]can be moved inside the main outer loop.
costvalues can be computed in parallel, and the algorithm can be adapted to perform the
minimumfunction in phases to eliminate dependencies.
The Levenshtein distance has several simple upper and lower bounds that are useful in applications which compute many of them and compare them. These include:
"Systems and Methods for Word Offensiveness Detection and Processing Using Weighted Dictionaries and Normalization" in Patent Application Approval Process
Mar 07, 2013; By a News Reporter-Staff News Editor at Politics & Government Week -- A patent application by the inventors Spears, Joseph L....
Patent Issued for Systems and Methods for Word Offensiveness Detection and Processing Using Weighted Dictionaries and Normalization
Nov 07, 2012; From Alexandria, Virginia, VerticalNews journalists report that a patent by the inventor Spears, Joseph L. (Hayward, CA), filed...