AEMB: efficient abundance estimation for metagenomic binning

Shaojun Pan
Ivan Tolstoganov
Kristoffer Sahlin
Marcel Martin
Xing-Ming Zhao
Luis Pedro Coelho

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Metagenomic binning is a crucial step in metagenomic analysis, namely grouping together contigs that are predicted to originate from the same genome to enable the recovery of metagenome-assembled genomes (MAGs). It has been shown that using information from multiple samples yields better results than binning each sample independently. However, for N metagenomic samples, using full multi-against-multi binning requires N ² alignments, making it computationally challenging to apply in large-scale metagenomic studies.

Here, we propose AEMB (Abundance Estimation for Metagenomic Binning), a novel mapping mode implemented in strobealign. AEMB is a computationally efficient abundance estimation method that uses a prefix-lookup vector as an indexing structure to reduce memory usage and randstrobes to estimate the abundance of contigs without performing base-level alignment. Compared to the hash table used in the previous version of strobealign, the indexing structure reduces peak memory usage by 25.2% with almost the same runtime. Furthermore, we implemented a fast abundance estimation method that skips base-level alignment. Altogether, AEMB reduces the runtime for abundance estimation by 88% to 96% compared to commonly used alignment methods such as Bowtie2 and BWA, while achieving similar binning results.

AEMB is available as a mapping mode in strobealign https://github.com/ksahlin/strobealign and SemiBin2 (v2.1 and later) accepts its inputs for binning.

Version published to 10.1101/2025.07.30.667338 on bioRxiv
Aug 1, 2025

Pangenome-guided sequence assembly via binary optimisation

This article has 5 authors:
1. Josh Cudby
2. James Bonfield
3. Chenxi Zhou
4. Richard Durbin
5. Sergii Strelchuk
This article has no evaluationsLatest version Aug 7, 2025
Leviathan : A fast, memory-efficient, and scalable taxonomic and pathway profiler for (pan)genome-resolved metagenomics and metatranscriptomics

This article has 1 author:
1. Josh L. Espinoza
This article has no evaluationsLatest version Jul 18, 2025
Euktect: Enhanced Eukaryotic Sequence Detection and Classification in Metagenomes via the DNA Language Model

This article has 4 authors:
1. Yibo Peng
2. Boyang Ji
3. Yinzhao Wang
4. Hongzhong Lu
This article has no evaluationsLatest version Jun 24, 2025

AEMB: efficient abundance estimation for metagenomic binning

Listed in

Abstract

Article activity feed

Pangenome-guided sequence assembly via binary optimisation

Leviathan : A fast, memory-efficient, and scalable taxonomic and pathway profiler for (pan)genome-resolved metagenomics and metatranscriptomics

Euktect: Enhanced Eukaryotic Sequence Detection and Classification in Metagenomes via the DNA Language Model

Listed in

Abstract

Article activity feed

Related articles

Pangenome-guided sequence assembly via binary optimisation

Leviathan : A fast, memory-efficient, and scalable taxonomic and pathway profiler for (pan)genome-resolved metagenomics and metatranscriptomics

Euktect: Enhanced Eukaryotic Sequence Detection and Classification in Metagenomes via the DNA Language Model