Chimera: Ultrafast and Memory-efficient Database Construction for High-Accuracy Taxonomic Classification in the Age of Expanding Genomic Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid growth of genomic data expands species diversity but also causes taxonomic imbalance, with certain species heavily overrepresented. Both data volume and imbalance challenge the accuracy and efficiency of metagenomic tools. Here, we present Chimera, a transformative tool harnessing the Interleaved Merged Cuckoo Filter (IMCF) and FairMin-Cap (FMC) strategy for next-level performance. It achieves the highest classification accuracy while providing an astonishing 162-fold faster database assembly than Kraken2, constructing the complete RefSeq genome database within mere minutes using under 32 GB of RAM, enabling rapid and cost-effective database updates. Furthermore, Chimera's universal memory scalability supports at least 300,000 species and potentially over 800,000 species in practical 1 TB systems, overwhelming traditional software solutions. Our results establish Chimera as a foundational tool for the next era of metagenomic research, laying a crucial cornerstone for the future of ultramassive genome datasets.

Article activity feed