Recovery of highly contiguous genomes from complex terrestrial habitats reveals over 15,000 novel prokaryotic species and expands characterization of soil and sediment microbial communities
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genomes are fundamental to understanding microbial ecology and evolution. The emergence of high-throughput, long-read DNA sequencing has enabled recovery of microbial genomes from environmental samples at scale. However, expanding the microbial genome catalogue of soils and sediments has been challenging due to the enormous complexity of these environments. Here, we performed deep, long-read Nanopore sequencing of 154 soil and sediment samples collected across Denmark and through an optimised bioinformatics pipeline, we recovered genomes of 15,314 novel microbial species, including 4,757 high-quality genomes. The recovered microbial genomes span 1,086 novel genera and provide the first high-quality reference genomes for 612 previously known genera, expanding the phylogenetic diversity of the prokaryotic tree of life by 8 %. The long-read assemblies also enabled the recovery of thousands of complete rRNA operons, biosynthetic gene clusters and CRISPR-Cas systems, all of which were underrepresented and highly fragmented in previous terrestrial genome catalogues. Furthermore, the incorporation of the recovered MAGs into public genome databases significantly improved species-level classification rates for soil and sediment metagenomic datasets, thereby enhancing terrestrial microbiome characterization. With this study, we demonstrate that long-read sequencing and optimised bioinformatics, allows cost-effective recovery of high-quality microbial genomes from highly complex ecosystems, which remain the largest untapped source of biodiversity for expanding genome databases and filling in the gaps of the tree of life.