Quantitative evaluation of microbiome sequencing resolution under varying experimental conditions using defined mock communities
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background : Objective evaluation of sequencing resolution is crucial for comparing technologies and ensuring reproducibility in microbiome analysis. Specifically, a systematic approach is necessary to quantitatively assess the effect of various platforms and experimental conditions on species-level resolution. Therefore, this study quantitatively evaluated multiple strategies, including 16S V3–V4 (16P), full-length 16S rRNA gene (16F), and whole metagenome shotgun sequencing (WMS), using a commercial DNA-based mock community (MC) and a domestically developed whole-cell MC (Korea MC [KMC]). The WMS strategy included 12 combinations of input DNA concentrations and sequencing output levels. A total of 64 WMS libraries were constructed for KMC samples, and 112 sequencing datasets were analysed. Taxonomic resolution was assessed using an adjusted F1-score integrating detection sensitivity and abundance-level reproducibility. Results : Qualitatively examining the detected species against the expected species across platforms, WMS showed a true positive abundance ratio of over 90%, 16F was observed to have an average of 60%, and 16P was observed to have an average of less than 10%. The combination of 10 ng input and 10 gigabases output consistently yielded the highest species-level resolution. However, reduced performance was observed in some MCs under 1 ng or 100 ng DNA input conditions. Detection sensitivity varied by taxon and condition. Specifically, Streptococcus pneumoniae and Cryptococcus neoformans were detected only under high-input or -output conditions, whereas Escherichia coli exhibited optimal accuracy at intermediate inputs. Acinetobacter species demonstrated reduced resolution as input DNA increased. KMC samples showed species- and format-specific variability in DNA extraction efficiency. Conclusions : This study establishes a quantitative framework for assessing species-level resolution across sequencing conditions and taxa using defined MCs. The findings provide practical guidance for selecting sequencing strategies aligned with analytical objectives and resource constraints.