MADRe: Strain-Level Metagenomic Classification Through Assembly-Driven Database Reduction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Strain-level metagenomic classification is essential for understanding microbial diversity and functional potential, but remains challenging, par- ticularly in the absence of prior knowledge about the composition of the sample. In this paper we present MADRe, a modular and scalable pipeline for long-read strain-level metagenomic classification, enhanced with M etagenome A ssembly-Driven D atabase Re duction. MADRe com- bines long-read metagenome assembly, contig-to-reference mapping reas- signment based on an expectation-maximization algorithm for database reduction, and probabilistic read mapping reassignment to achieve sensi- tive and precise classification. We extensively evaluated MADRe on sim- ulated datasets, mock communities, and a real anaerobic digester sludge metagenome, demonstrating that it consistently outperforms existing tools by achieving higher precision with reduced false positives. MADRe’s de- sign allows users to apply either the database reduction or read classi- fication step individually. Using only the read classification step shows results on par with other tested tools. MADRe is open source and pub- licly available at https://github.com/lbcb-sci/MADRe .

Article activity feed