Binning unassembled short reads based on k -mer abundance covariance using sparse coding

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets.

Results

We present here a scalable pre-assembly binning scheme (i.e., operating on unassembled short reads) enabling latent genome recovery by leveraging sparse dictionary learning and elastic-net regularization, and its use to recover hundreds of metagenome-assembled genomes, including very low-abundance genomes, from a joint analysis of microbiomes from the LifeLines DEEP population cohort (n = 1,135, >1010 reads).

Conclusion

We showed that sparse coding techniques can be leveraged to carry out read-level binning at large scale and that, despite lower genome reconstruction yields compared to assembly-based approaches, bin-first strategies can complement the more widely used assembly-first protocols by targeting distinct genome segregation profiles. Read enrichment levels across 6 orders of magnitude in relative abundance were observed, indicating that the method has the power to recover genomes consistently segregating at low levels.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giaa028

    Olexiy Kyrgyzov 1CEA Genoscope, Evry, FranceFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteVincent Prost 1CEA Genoscope, Evry, France2CEA LIST, Gif-sur-Yvette, FranceFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteStéphane Gazut 2CEA LIST, Gif-sur-Yvette, FranceFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteBruno Farcy 3Bull Technologies, Les Clayes-sous-Bois, FranceFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteThomas Brüls 1CEA Genoscope, Evry, FranceFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteFor correspondence: bruls@genoscope.cns.fr

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giaa028 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102154 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102155