MADRe: Strain-Level Metagenomic Classification Through Assembly-Driven Database Reduction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Strain-level metagenomic classification is essential for understanding microbial diversity and functional potential, but remains challenging, par- ticularly in the absence of prior knowledge about the composition of the sample. In this paper we present MADRe, a modular and scalable pipeline for long-read strain-level metagenomic classification, enhanced with M etagenome A ssembly-Driven D atabase Re duction. MADRe com- bines long-read metagenome assembly, contig-to-reference mapping reas- signment based on an expectation-maximization algorithm for database reduction, and probabilistic read mapping reassignment to achieve sensi- tive and precise classification. We extensively evaluated MADRe on sim- ulated datasets, mock communities, and a real anaerobic digester sludge metagenome, demonstrating that it consistently outperforms existing tools by achieving higher precision with reduced false positives. MADRe’s de- sign allows users to apply either the database reduction or read classi- fication step individually. Using only the read classification step shows results on par with other tested tools. MADRe is open source and pub- licly available at https://github.com/lbcb-sci/MADRe .