Fairy: fast approximate coverage for multi-sample metagenomic binning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenomeassembled genomes (MAGs). Contigs are linked by exploiting consistent read coverage patterns across a genome. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck.

Results

We present fairy ( https://github.com/bluenote-1577/fairy ), an approximate coverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be > 250 × faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers 98.5% of MAGs with > 50% completeness and < 5% incompleteness relative to alignment with BWA. Notably, multi-sample binning with fairy is always better than single-sample binning using BWA ( > 1.5 × more > 50% complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multisample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy’s results are indistinguishable from read alignment.

Conclusions

Fairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a longstanding computational bottleneck for metagenomics.

Article activity feed