Metaxa: A Transformer-Based Deep Learning Model for Taxonomic Classification of Long Nanopore Reads

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A significant fraction of the microbial diversity remains unclassified, hindering our understanding of microbial roles in health and ecosystems. State-of-the-art methods like Kraken 2 perform well for taxa that are present in the database. However, their accuracy drops significantly when classifying taxa that are not included. While deep learning has advanced many fields, its applications in metagenomics remain limited, and its full potential has yet to be realized. Here, we present Metaxa, a transformer-based deep learning model designed for the taxonomic classification of long-read Nanopore sequences. Metaxa leverages the sequential context of Nanopore reads, enabling robust classification beyond fixed k-mer profiles. Our results show that Metaxa matches Kraken 2 on in-sample data at both the species and genus levels, and significantly outperforms both Kraken 2 and MetageNN at the genus level on out-of-sample datasets where the species genome is absent from the reference database but a different species from the same genus is present. Furthermore, Metaxa demonstrates strong generalization across different Nanopore chemistries (R9.4.1 and R10.4.1). This work highlights the potential of deep learning models to improve metagenomic classification accuracy, especially in complex or underexplored environments where traditional tools fall short.

Article activity feed