Puget predicts gene expression across cell types using sequence and 3D chromatin organization data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gene expression is governed by both linear DNA sequence and three-dimensional (3D) chromatin architecture. Most gene expression prediction models rely on sequence alone, thereby failing to capture structural context and to generalize to unseen cell types. We present Puget, a deep learning model that predicts cell type-specific gene expression from sequence and Hi-C data, which captures 3D chromatin organization. Puget pairs pretrained sequence and Hi-C encoders with a lightweight transformer decoder. Using paired Hi-C/RNA-seq from 36 human and 4 mouse biosamples, we evaluate the ability of Puget to generalize to held-out genes, held-out biosamples, and from human to mouse. Relative to a sequence-only baseline, Puget improves cross-biosample Pearson correlation by up to 25% on highly variable genes in training biosamples and, unlike the sequence-only model, generalizes to held-out biosamples and across species. In addition, in silico perturbation experiments show that Puget can prioritize experimentally validated enhancer-gene pairs. Together, these results highlight a generalizable approach for modeling gene expression from sequence and 3D chromatin organization.