AEMB: efficient abundance estimation for metagenomic binning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Metagenomic binning is a crucial step in metagenomic analysis, namely grouping together contigs that are predicted to originate from the same genome to enable the recovery of metagenome-assembled genomes (MAGs). It has been shown that using information from multiple samples yields better results than binning each sample independently. However, for N metagenomic samples, using full multi-against-multi binning requires N 2 alignments, making it computationally challenging to apply in large-scale metagenomic studies.

Here, we propose AEMB (Abundance Estimation for Metagenomic Binning), a novel mapping mode implemented in strobealign. AEMB is a computationally efficient abundance estimation method that uses a prefix-lookup vector as an indexing structure to reduce memory usage and randstrobes to estimate the abundance of contigs without performing base-level alignment. Compared to the hash table used in the previous version of strobealign, the indexing structure reduces peak memory usage by 25.2% with almost the same runtime. Furthermore, we implemented a fast abundance estimation method that skips base-level alignment. Altogether, AEMB reduces the runtime for abundance estimation by 88% to 96% compared to commonly used alignment methods such as Bowtie2 and BWA, while achieving similar binning results.

AEMB is available as a mapping mode in strobealign https://github.com/ksahlin/strobealign and SemiBin2 (v2.1 and later) accepts its inputs for binning.

Article activity feed