A theoretical and experimental framework enables low-coverage sequencing for accurate quantification of genome-wide cytosine modification levels
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) regulate gene expression and exhibit dynamic levels during development and disease. While high-depth, base-resolution studies offer the most detailed view of epigenetic landscapes, many open questions are answered by surveying changes in 5mC/5hmC levels across larger cohorts. Nonetheless, current global quantification methods, including mass spectrometry, are typically limited in accessibility, accuracy, or throughput. Here, to evaluate the viability of low-coverage sequencing as an alternative, we first computationally downsampled deeply sequenced data to resolve the three-way relationship between sequencing coverage, modification levels, and measurement error. This relationship allowed us to develop a facile online tool for error calculation and to define experimental targets: <0.24% genome coverage can quantify 5mC and low-abundance 5hmC with minimal and predictable errors (<5%). Importantly, in direct comparisons, low-depth sequencing (Sparse-Seq) demonstrated high accuracy and less variability than mass spectrometry, while distinctively preserving genomic context. Applied serially to developing mouse brains, Sparse-Seq revealed an earlier emergence of 5hmCpG compared to 5mCpH and uncovered previously overlooked, genomic feature-specific epigenetic changes. This work establishes a rigorous foundation for employing Sparse-Seq as a highly accessible approach for 5mC/5hmC quantification, enabling economical first-pass analysis of epigenetic landscapes suited for large cohort studies and new hypothesis generation.