Faster Algorithm for Bounded Damerau–Levenshtein Distance
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The Damerau–Levenshtein distance between two strings is the minimum number of insertions, deletions, substitutions, and adjacent transpositions required to transform one string into the other. Unlike the standard Levenshtein distance, it accounts for the common typing error of adjacent character swaps. When edits are restricted so that no substring is edited more than once, existing algorithms for Levenshtein distance can be extended with relatively minor changes to support transpositions. However, in the unrestricted setting (i.e., edits may overlap or interact arbitrarily), the problem becomes significantly more complex, and existing techniques no longer apply directly. In this work, we show that even in the unrestricted setting, the Damerau–Levenshtein distance can be computed efficiently. We present two algorithms that extend the classic $O(n + k^2)$ edit-distance frameworks of Myers (Algorithmica ’86) and Landau and Vishkin (JCSS ’88), adapting them to accommodate unrestricted transpositions. The first algorithm runs in $O(\sigma n + k^2)$ time and space, where $\sigma$ is the alphabet size. The second achieves $O(n + k^2 \log n)$ time and $O(n + k^2)$ space for integer alphabets. Here, $n$ is the length of the input strings and $k$ is the distance threshold. Experimental results show that our algorithms achieve substantial speedups over $k$-independent methods when $k$ is small.