mIDEA: An Interpretable Structure–Sequence Model for Methylation-Dependent Protein–DNA Binding Sensitivity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
DNA methylation dynamically reshapes protein–DNA interaction landscapes, yet the mechanistic principles governing methylation-dependent recognition remain incompletely understood. Currently, there is no predictive framework that integrates methylation-aware binding specificity with 3D structure-based residue interactions for quantitatively assessing the impact of DNA methylation on protein–DNA interactions. To bridge this gap, we introduce mIDEA (Methylation-informed Interpretable protein–DNA Energy Associative model), a structure-aware, residue-level biophysical framework that predicts and interprets protein–methylated-DNA interactions by optimizing an amino acid–nucleotide energy matrix explicitly accounting for the modulatory effect of 5-methylcytosine. We validate mIDEA across diverse human transcription factors with distinct CpG methylation preferences. Using only structural templates and prior biochemical knowledge, mIDEA accurately recapitulates both “methyl-plus” and “methyl-minus” effects of DNA methylation on protein–DNA binding. Incorporating quantitative methylation-sensitive binding data further enables mIDEA to capture position-specific methylation effects. When integrated with whole-genome methylation profiles, the model enhances in vivo binding-site prediction by reducing false positives while preserving sensitivity compared to models trained on unmethylated sequences. Overall, mIDEA provides a principled framework for elucidating the molecular mechanisms underlying methylation-modulated protein–DNA recognition. The mIDEA source code is available at: https://github.com/LinResearchGroup-NCSU/IDEA_DNA_Methylation .