Dissecting sequence determinants of DNA methylation and in silico perturbation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

DNA methylation shapes cellular identity and genome function, yet the sequence logic that governs its establishment remains largely unresolved. Here we present MethylAI, a cross-species–pretrained and human-specialized deep learning framework that predicts single-CpG methylation states directly from genomic sequence with high fidelity. Through quantitative attribution, MethylAI uncovers intrinsic cis-regulatory principles embedded in DNA, revealing conserved transcription factor (TF) motifs whose influence extends beyond CpG composition. Activated motifs demarcate gene-body CpG islands, encode lineage-defining methylation states, and align with TF occupancy and diverse histone modifications, thereby linking sequence syntax to local chromatin regulation. CTCF perturbation experiments validated MethylAI-predicted methylation shifts at activated motifs, showing strong concordance with in silico perturbations. Mapping genetic variants onto methylation-linked active motifs further uncovered widespread connections to zinc-finger TF families, providing mechanistic insight into how noncoding variation contributes to human traits and diseases. By transforming DNA sequence into interpretable methylation grammar, MethylAI establishes a generalizable framework for decoding the regulatory architecture of the human epigenome.

Article activity feed