scDNAm-GPT Captures Genome-wide CpG Dependencies in Single-cell DNA methylomes to Revolutionize Epigenetic Analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Single-cell DNA methylomes are challenging to interpret because of sparse CpG coverage and the complexity of genome-wide sequences. We present scDNAm-GPT, a universal foundation model that uses context-aware CpG tokenization, a Mamba backbone, and cross-attention to capture both local and global DNA methylation patterns. Trained on over one million single cells from 35 human and mouse tissues, scDNAm-GPT enables accurate cell clustering, zero-shot prediction of CpG effects on gene expression, improved trajectory inference, and reference-free deconvolution of cell types from cell-free DNA. The model hierarchically learns regulatory features, and its attention maps highlight functionally relevant regions, demonstrating high biological interpretability. These results establish scDNAm-GPT as a scalable and generalizable framework for single-cell epigenomic analysis, offering new opportunities to dissect epigenetic regulation in health and disease. Code is available at GitHub ( https://github.com/ChaoqiLiang/scDNAm-GPT ).

Article activity feed