Real-time Taxonomic Characterization of Long-read Mixed-species Sequencing Samples in Sorted Motif Distance Space: Voyager

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent advances in long-read sequencing technology enable its use in potentially life-saving applications for rapid clinical diagnostics and epidemiological monitoring. To take advantage of these enabling characteristics, we present Voyager , a novel algorithm that complements real-time sequencing by rapidly and efficiently mapping long sequencing reads with insertion- and deletion errors to a large set of reference genomes. The concept of Sorted Motif Distance Space ( SMDS ), i.e., distances between exact matches of short motifs sorted by rank, represents sequences and sequence complementarity in a highly compressed form and is thus computationally efficient while enabling strain-level discrimination. In addition, Voyager applies a deconvolution algorithm rather than reducing taxonomic resolution if sequences of closely related organisms cannot be discerned by SMDS alone. Using relevant real-world data, we evaluated Voyager against the current best taxonomic classification methods (Kraken 2 and Centrifuge). Voyager was on average more than twice as fast as the current fastest method and obtained on average over 40% higher species level accuracy while maintaining lower memory usage than both other methods.

Article activity feed