A theoretical and experimental framework enables low-coverage sequencing for accurate quantification of genome-wide cytosine modification levels

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) regulate gene expression and exhibit dynamic levels during development and disease. While high-depth, base-resolution studies offer the most detailed view of epigenetic landscapes, many open questions are answered by surveying changes in 5mC/5hmC levels across larger cohorts. Nonetheless, current global quantification methods, including mass spectrometry, are typically limited in accessibility, accuracy, or throughput. Here, to evaluate the viability of low-coverage sequencing as an alternative, we first computationally downsampled deeply sequenced data to resolve the three-way relationship between sequencing coverage, modification levels, and measurement error. This relationship allowed us to develop a facile online tool for error calculation and to define experimental targets: <0.24% genome coverage can quantify 5mC and low-abundance 5hmC with minimal and predictable errors (<5%). Importantly, in direct comparisons, low-depth sequencing (Sparse-Seq) demonstrated high accuracy and less variability than mass spectrometry, while distinctively preserving genomic context. Applied serially to developing mouse brains, Sparse-Seq revealed an earlier emergence of 5hmCpG compared to 5mCpH and uncovered previously overlooked, genomic feature-specific epigenetic changes. This work establishes a rigorous foundation for employing Sparse-Seq as a highly accessible approach for 5mC/5hmC quantification, enabling economical first-pass analysis of epigenetic landscapes suited for large cohort studies and new hypothesis generation.

Article activity feed