I understand substitution cost needs to be adjusted, either 0 or 1:
for j from 1 to n: for i from 1 to m: if s(i) = t(j): substitutionCost := 0 else: substitutionCost := 1 d(i, j) := minimum(d(i-1, j) + 1, // deletion d(i, j-1) + 1, // insertion d(i-1, j-1) + substitutionCost) // substitution
But in later proof part, it claims when s(i) == t(j), d(i, j) = d(i-1, j-1):
If s(i) = t(j), and we can transform s(1..i-1) to t(1..j-1) in k operations, then we can do the same to s(1..i) and just leave the last character alone, giving k operations.
I don’t understand, shouldn’t it be d(i, j) = min(d(i-1, j-1), d(i-1, j) + 1, d(i, j-1)+1)?
It’s also mentioned in edit distance wiki
How do you prove if the last chars are matched, d(i, j) is d(i-1, j-1)?
following the lemma here,
I can think one way to prove this is to assert minimal is not from d(i-1, j-1) when s(i) = t(j). Then draw contradiction.
if minimal is not from d(i-1, j-1), the minimum must come from
- deleting last char from s, s(i).
- or inserting last char from t, t(j).
If we go with
1, we need to match s(0..i-1) with t(0..j).
A: if s(i-1) != t(j),
- insert t(j) to end of s, we are rewinding to the original state.
- modify s(i-1) to t(j), it’s equvilent to deleting s(i-1) (+1) when matching s(i-1) with t(j-1), whose minimum is d(i-1, j-1).
- delete s(i-1) -> only viable way
B: if s(i-1) = t(j),
- no insert t(j), otherwise the previous deletion is redundant.
- no modification: equivalent to 2nd point in discussion A.
- deletion -> viable
Combine A and B, only deletion is acceptable and it’s impossible to match if only deletion is performed on s.
Do you guys think this proof is correct?