Efficient High-Quality Metagenome Assembly from Long Accurate Reads using Minimizer-space de Bruijn Graphs
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
We introduce a novel metagenomics assembler for high-accuracy long reads. Our approach, implemented as metaMDBG, combines highly efficient de Bruijn graph assembly in minimizer space, with both a multi- k ′ approach for dealing with variations in genome coverage depth and an abundance-based filtering strategy for simplifying strain complexity. The resulting algorithm is more efficient than the state-of-the-art but with better assembly results. metaMDBG was 1.5 to 12 times faster than competing assemblers and requires between one-tenth and one-thirtieth of the memory across a range of data sets. We obtained up to twice as many high-quality circularised prokaryotic metagenome assembled genomes (MAGs) on the most complex communities, and a better recovery of viruses and plasmids. metaMDBG performs particularly well for abundant organisms whilst being robust to the presence of strain diversity. The result is that for the first time it is possible to efficiently reconstruct the majority of complex communities by abundance as nearcomplete MAGs.
Article activity feed
-
Improved reconstruction of circularised phage and plasmid genomes
This is just an extra thing, but it would be interesting to see how well this tool performs at recovering genomes from eukaryotic lineages since short-read methods produce very fragmented assemblies. Some of the metagenomes in this list are from communities with eukaryotes, such as the cheese samples: https://github.com/PacificBiosciences/pb-metagenomics-tools/blob/master/docs/PacBio-Data.md
-
We grouped MAGs into three conventional categories based on the CheckM results: ‘near-complete’ if its completeness is ≥ 90% and its contamination is ≤ 5%, ‘high-quality’ if completeness ≥ 70% and contamination ≤ 10%, ‘medium quality’ if completeness ≥ 50% and contamination ≤ 10%.
Did you also take into consideration number of rRNAs/tRNAs into categories such as those in MIMAG/MISAG: https://www.nature.com/articles/nbt.3893?
-
Abstract
It would probably help to bring visibility to the tool if the link to the github repository was in the abstract
-