Puget predicts gene expression across cell types using sequence and 3D chromatin organization data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gene expression is governed by both linear DNA sequence and three-dimensional (3D) chromatin architecture. Most gene expression prediction models rely on sequence alone, thereby failing to capture structural context and to generalize to unseen cell types. We present Puget, a deep learning model that predicts cell type-specific gene expression from sequence and Hi-C data, which captures 3D chromatin organization. Puget pairs pretrained sequence and Hi-C encoders with a lightweight transformer decoder. Using paired Hi-C/RNA-seq from 36 human and 4 mouse biosamples, we evaluate the ability of Puget to generalize to held-out genes, held-out biosamples, and from human to mouse. Relative to a sequence-only baseline, Puget improves cross-biosample Pearson correlation by up to 25% on highly variable genes in training biosamples and, unlike the sequence-only model, generalizes to held-out biosamples and across species. In addition, in silico perturbation experiments show that Puget can prioritize experimentally validated enhancer-gene pairs. Together, these results highlight a generalizable approach for modeling gene expression from sequence and 3D chromatin organization.

Article activity feed