Evaluation of Metagenome Binning: Advances and Challenges

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Several recent deep learning methods for metagenome binning claim improvements in the recovery of high quality metagenome-assembled genomes. These methods differ in their approaches to learn the contig embeddings and to cluster them. Rapid advances in binning require rigorous benchmarking to evaluate the effectiveness of new methods. We have benchmarked newly developed state-of-the-art deep learning binners on CAMI2 datasets, including our own, McDevol.

Results

The results show that COMEBin and GenomeFace give the best binning accuracy, although not always the best embedding accuracy. Interestingly, post-binning reassembly consistently improves the quality of low coverage bins. We find that binning coassembled contigs with multi-sample coverage is effective for low coverage dataset while binning multi-sample contigs with multi-sample coverage (‘multi-sample’) is effective for high-coverage samples. In multi-sample binning, splitting the embedding space by sample before clustering showed enhanced performance compared to the standard approach of splitting final clusters by sample.

Conclusions

COMEBin and GenomeFace emerged as the top-performing tools overall, with MetaBAT2 and GenomeFace demonstrating superior speed. To facilitate future development, we provide workflows for standardized benchmarking of metagenome binners.

Article activity feed