A Chromatin-Structure-Guided Framework for Predictive and Interpretable Regulatory Genomics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Chromatin organization shapes gene regulation by linking distal elements across megabase scales, yet most predictive genomics models still treat the genome as linear, without incorporating three-dimensional structure. Hi-C provides genome-wide chromatin conformation information, but its contact maps are population-averaged, distance-biased, and noisy, obscuring the biologically specific contacts. We present CHROME, a framework built on a self-avoiding polymer ensemble null model that identifies physically specific, non-random Hi-C contacts. By integrating these contacts into graph representations, CHROME enables efficient information transfer across spatially connected loci. It integrates sequence, chromatin accessibility, or pre-trained embeddings into a graph attention architecture to predict cell line-specific ChIP-seq profiles, consistently outperforming local encoder baselines and generalizing to an unseen cell line. The resulting graph embeddings also enhance prediction on tissue-specific eQTL and ClinVar variant pathogenicity, outperforming local sequence-based embeddings. Beyond predictive performance, CHROME provides interpretability through attention-derived neighbor-to-center contributions that reveal how spatially connected loci influence local regulatory activity over multi-megabase distances. Together, these results show that incorporating physically validated chromatin interactions enables more accurate and interpretable modeling of gene regulation and variant effects.