Whole Metagenome Sequencing: not Deep Enough for Complete Microbial Function Recovery
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Whole metagenome shotgun sequencing (WMS) is widely used to profile microbial function. However, technical variability in sequencing and analysis often obscures true biological patterns. Large-scale studies are particularly susceptible to batch effects, such as differences in sequencing depth and platform and annotation strategies, as well as sample-to-flow-cell assignments. However, the relative effects of these factors on functional inference in such studies have yet to be systematically evaluated.
We analyzed oral-rinse WMS data from a study cohort including 671 Nigerian youths aged 9-18, sequenced on two Illumina platforms. Microbial molecular functionality encoded in these data were annotated using the mi-faser/Fusion pipeline, to capture the broad functional repertoire, and HUMAnN 3/EC numbers pipeline to characterize curated enzymatic activities. We then quantified how technical factors and batch effects shaped the recovery of microbial functionality.
Results
Three findings of our work were most salient. First, we observed that the choice of annotation strategy traded off between breadth and specificity of functional coverage. Second, we found that low-prevalence functions were disproportionately lost at shallow sequencing depths, indicating that in e.g. case-control studies with few representatives of the minor class, sequencing depth could critically impact study resolution. Finally, using our newly developed model relating sequencing depth to functional recovery, we demonstrated that increasing sequencing depth does not directly or proportionally improve functional recall. That is, at as little as 10% of this study’s sequencing depth, 30% of the estimated complete microbiome functional repertoire was detectable. However, even at the full depth used in this study, we were only able to recover an estimated 60% of that complete functional repertoire.
Conclusions
Together, these findings and our depth-to-function mapping framework provide practical guidelines for the design and interpretation of WMS studies. Coordinating sequencing depth planning with annotation strategy, experimental design, and rigorous batch control is thus essential for robust detection of microbial functions and for ensuring reproducible microbiome insights.