Decomposing a San Francisco estuary microbiome using long-read metagenomics reveals species- and strain-level dominance from picoeukaryotes to viruses

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Although long-read sequencing has enabled obtaining high-quality and complete genomes from metagenomes, many challenges still remain to completely decompose a metagenome into its constituent prokaryotic and viral genomes. This study focuses on decomposing an estuarine metagenome to obtain a more accurate estimate of microbial diversity. To achieve this, we developed a new bead-based DNA extraction method, a novel bin refinement method, and obtained 150 Gbp of Nanopore sequencing. We estimate that there are ~500 bacterial and archaeal species in our sample and obtained 68 high-quality bins (>90% complete, <5% contamination, ≤5 contigs, contig length of >100 kbp, and all ribosomal and tRNA genes). We also obtained many contigs of picoeukaryotes, environmental DNA of larger eukaryotes such as mammals, and complete mitochondrial and chloroplast genomes and detected ~40,000 viral populations. Our analysis indicates that there are only a few strains that comprise most of the species abundances.

IMPORTANCE

Ocean and estuarine microbiomes play critical roles in global element cycling and ecosystem function. Despite the importance of these microbial communities, many species still have not been cultured in the lab. Environmental sequencing is the primary way the function and population dynamics of these communities can be studied. Long-read sequencing provides an avenue to overcome limitations of short-read technologies to obtain complete microbial genomes but comes with its own technical challenges, such as needed sequencing depth and obtaining high-quality DNA. We present here new sampling and bioinformatics methods to attempt decomposing an estuarine microbiome into its constituent genomes. Our results suggest there are only a few strains that comprise most of the species abundances from viruses to picoeukaryotes, and to fully decompose a metagenome of this diversity requires 1 Tbp of long-read sequencing. We anticipate that as long-read sequencing technologies continue to improve, less sequencing will be needed.

Article activity feed

  1. Thanks for the lovely paper! I have some comments on the section, "Binning of Prokaryotic Genomes and Bin Refinement." First, I'm curious if you tried to decontaminate any bins. I think this could be an interesting step, whether they initially had contamination greater than or less than 5%. I think GUNC can be used for this. I've also contributed to a software called charcoal that could work for this. Second, I have concerns about the bin refinement you performed. I think assemblers (esp short read assemblers, so may be less relevant) typically break contiguous sequences and produce fragments when there is either incomplete sequencing or strain variation that causes a bifurcation in the underlying graph. Could your refinement technique be merging contigs together that would never be combined in nature? It could be interesting to investigate the underlying assembly graph for some of these merges, either in the short or long reads. I see you think about this in the paragraph, "This bin refinement method allows the merging of contigs..." I think it might be worth additional investigation to ensure this is not happening, as the methods you use here could be precedent-setting for long read metagenomes (esp those sequenced without accompanying hi-c sequencing data).

    Thanks for this effort, I really enjoyed your preprint. I loved the note at the beginning about the length of the preprint :)