Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Two culture-independent methods, amplicon-based sequencing and shotgun metagenomics, have significantly advanced the study of microbial communities. To date, short-read sequencing technologies have enabled high accuracy and deep coverage, while long-read sequencing approaches are increasingly being applied to improve genome assembly, despite challenges related to sequencing accuracy and nucleic acids input requirements. In this benchmark study, we compared the shotgun metagenomics approach across three sequencing technologies, Illumina (short reads), PacBio and Nanopore (long reads), using a commercial microbial community consisting of 20 known species. Specifically, we evaluated the effectiveness of the data generated by each platform in reconstructing and identifying specific known taxa, as well as in understanding their genetic and functional potential, considering annotated genes, length of predicted proteins and number/types of inferred functions. Results Illumina sequencing provided high-throughput and high-quality data, but its limited read length precluded complete genome assembly. This affected the functional analysis, leading to an underestimation of the coding and non-coding genes. Nanopore sequencing yielded the longest reads, resulting in more contiguous assemblies, although it was impacted by higher error rates and the choice of assembly method. PacBio offered the best balance between read length and base accuracy, but with a lower number of reads. This affected genome coverage for a few specific taxa, influencing the quality of their assemblies, the completeness of MAGs (Metagenome Assembled Genomes), and the accuracy of functional annotation. Nevertheless, PacBio successfully retrieved MAGs for all mock community species, and the genomes annotation was consistent with the reference. Conclusions This study offers a valuable framework to guide the selection of sequencing strategies in metagenomic research. Understanding the strengths and limitations of each step of metagenomic workflows, from library preparation to bioinformatic analysis, is crucial for driving its ongoing optimization.