Sequence alignment with k -bounded matching statistics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Finding high-quality local alignments between a query sequence and sequences contained in a large genomic database is a fundamental problem in computational genomics, at the core of thousands of biological analysis pipelines. Here, we describe a novel algorithm for approximate local alignment search based on the so-called k -bounded matching statistics of the query sequence with respect to an indexed database of sequences. We compute the k -bounded matching statistics, which capture the longest common suffix lengths of consecutive k -mer matches between query and target sequences, using the spectral Burrows-Wheeler transform, a data structure that enables computationally efficient queries. We show that our method is as fast and as accurate as state-of-the-art tools in several bacterial genomics tasks. Our method is available as a set of three kbo Rust packages that provide a command-line interface, a graphical user interface that runs in a browser without server-side processing, and a core library that can be accessed by other tools.