Sequence alignment with k -bounded matching statistics

Tommi Mäklin
Jarno N. Alanko
Elena Biagi
Simon J. Puglisi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Finding high-quality local alignments between a query sequence and sequences contained in a large genomic database is a fundamental problem in computational genomics, at the core of thousands of biological analysis pipelines. Here, we describe a novel algorithm for approximate local alignment search based on the so-called k -bounded matching statistics of the query sequence with respect to an indexed database of sequences. We compute the k -bounded matching statistics, which capture the longest common suffix lengths of consecutive k -mer matches between query and target sequences, using the spectral Burrows-Wheeler transform, a data structure that enables computationally efficient queries. We show that our method is as fast and as accurate as state-of-the-art tools in several bacterial genomics tasks. Our method is available as a set of three kbo Rust packages that provide a command-line interface, a graphical user interface that runs in a browser without server-side processing, and a core library that can be accessed by other tools.

Version published to 10.1101/2025.05.19.654936v2 on bioRxiv
May 26, 2025
Version published to 10.1101/2025.05.19.654936v1 on bioRxiv
May 24, 2025

FastGA: Fast Genome Alignment

This article has 3 authors:
1. Gene Myers
2. Richard Durbin
3. Chenxi Zhou
This article has no evaluationsLatest version Jun 19, 2025
Kaminari: a resource-frugal index for approximate colored k -mer queries

This article has 6 authors:
1. Victor Levallois
2. Yoshihiro Shibuya
3. Bertrand Le Gal
4. Rob Patro
5. Pierre Peterlongo
6. Giulio Ermanno Pibiri
This article has no evaluationsLatest version May 21, 2025
Accelerating k -mer-based sequence filtering

This article has 6 authors:
1. Igor Martayan
2. Léa Vandamme
3. Bede Constantinides
4. Bastien Cazaux
5. Charles Paperman
6. Antoine Limasset
This article has no evaluationsLatest version Jun 20, 2025

Listed in

Abstract

Article activity feed

Related articles

FastGA: Fast Genome Alignment

Kaminari: a resource-frugal index for approximate colored k -mer queries

Accelerating k -mer-based sequence filtering