scDNAm-GPT Captures Genome-wide CpG Dependencies in Single-cell DNA methylomes to Revolutionize Epigenetic Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single-cell DNA methylomes are challenging to interpret because of sparse CpG coverage and the complexity of genome-wide sequences. We present scDNAm-GPT, a universal foundation model that uses context-aware CpG tokenization, a Mamba backbone, and cross-attention to capture both local and global DNA methylation patterns. Trained on over one million single cells from 35 human and mouse tissues, scDNAm-GPT enables accurate cell clustering, zero-shot prediction of CpG effects on gene expression, improved trajectory inference, and reference-free deconvolution of cell types from cell-free DNA. The model hierarchically learns regulatory features, and its attention maps highlight functionally relevant regions, demonstrating high biological interpretability. These results establish scDNAm-GPT as a scalable and generalizable framework for single-cell epigenomic analysis, offering new opportunities to dissect epigenetic regulation in health and disease. Code is available at GitHub ( https://github.com/ChaoqiLiang/scDNAm-GPT ).