Efficient High-Quality Metagenome Assembly from Long Accurate Reads using Minimizer-space de Bruijn Graphs

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

We introduce a novel metagenomics assembler for high-accuracy long reads. Our approach, implemented as metaMDBG, combines highly efficient de Bruijn graph assembly in minimizer space, with both a multi- k ′ approach for dealing with variations in genome coverage depth and an abundance-based filtering strategy for simplifying strain complexity. The resulting algorithm is more efficient than the state-of-the-art but with better assembly results. metaMDBG was 1.5 to 12 times faster than competing assemblers and requires between one-tenth and one-thirtieth of the memory across a range of data sets. We obtained up to twice as many high-quality circularised prokaryotic metagenome assembled genomes (MAGs) on the most complex communities, and a better recovery of viruses and plasmids. metaMDBG performs particularly well for abundant organisms whilst being robust to the presence of strain diversity. The result is that for the first time it is possible to efficiently reconstruct the majority of complex communities by abundance as nearcomplete MAGs.

Article activity feed

  1. Improved reconstruction of circularised phage and plasmid genomes

    This is just an extra thing, but it would be interesting to see how well this tool performs at recovering genomes from eukaryotic lineages since short-read methods produce very fragmented assemblies. Some of the metagenomes in this list are from communities with eukaryotes, such as the cheese samples: https://github.com/PacificBiosciences/pb-metagenomics-tools/blob/master/docs/PacBio-Data.md

  2. We grouped MAGs into three conventional categories based on the CheckM results: ‘near-complete’ if its completeness is ≥ 90% and its contamination is ≤ 5%, ‘high-quality’ if completeness ≥ 70% and contamination ≤ 10%, ‘medium quality’ if completeness ≥ 50% and contamination ≤ 10%.

    Did you also take into consideration number of rRNAs/tRNAs into categories such as those in MIMAG/MISAG: https://www.nature.com/articles/nbt.3893?