Using Deep Learning Models of Gene Regulation to Guide Drug Prioritization

Ivan Ovcharenko
Xiaoqin Huang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Drug repurposing offers a cost-effective strategy to accelerate therapeutic discovery, but most computational approaches fail to model noncoding genetic variation. Because over 90% of genome-wide association study (GWAS) risk variants reside in noncoding regions, linking regulatory variation to therapeutic hypotheses remains a major challenge. Here, we developed an integrative deep learning framework that links allele-specific enhancer prediction to transcription factor (TF)-centered gene expression changes and drug-induced transcriptional profiles to prioritize candidate therapeutics. Our cell type-specific deep learning enhancer models accurately distinguish active enhancers across seven cell lines. Using breast cancer as a proof-of-concept, we found that GWAS heritability is significantly enriched in MCF7 enhancers, supporting MCF7 as the cellular context for this disease. Allele-specific variant scoring identified breast cancer risk variants with strong allele-dependent effects, and attribution-based motif discovery revealed enrichment of FOXA1-associated motif features, consistent with FOXA1 upregulation in primary tumors. Integration of the FOXA1 knockdown-induced and drug-induced gene expression profiles identified 63 candidate compounds for treatment of breast cancer, including 18 approved drugs, with recovery of the known breast cancer therapy fulvestrant. Among prioritized compounds, 54% showed anti-correlated transcriptional effects across eight core breast cancer pathways, compared to 5.3% of non-prioritized compounds. Integration of drug-gene interaction data further refined these to eight compounds with supporting experimental or clinical evidence. Together, these results establish a regulatory variant-guided drug repurposing framework that connects noncoding genetic variation to therapeutic candidates and provides a generalizable strategy for translating the noncoding genome into pharmacologically relevant hypotheses.Drug repurposing offers a cost-effective strategy to accelerate therapeutic discovery, but most computational approaches fail to model noncoding genetic variation. Because over 90% of genome-wide association study (GWAS) risk variants reside in noncoding regions, linking regulatory variation to therapeutic hypotheses remains a major challenge. Here, we developed an integrative deep learning framework that links allele-specific enhancer prediction to transcription factor (TF)-centered gene expression changes and drug-induced transcriptional profiles to prioritize candidate therapeutics. Our cell type-specific deep learning enhancer models accurately distinguish active enhancers across seven cell lines. Using breast cancer as a proof-of-concept, we found that GWAS heritability is significantly enriched in MCF7 enhancers, supporting MCF7 as the cellular context for this disease. Allele-specific variant scoring identified breast cancer risk variants with strong allele-dependent effects, and attribution-based motif discovery revealed enrichment of FOXA1-associated motif features, consistent with FOXA1 upregulation in primary tumors. Integration of the FOXA1 knockdown-induced and drug-induced gene expression profiles identified 63 candidate compounds for treatment of breast cancer, including 18 approved drugs, with recovery of the known breast cancer therapy fulvestrant. Among prioritized compounds, 54% showed anti-correlated transcriptional effects across eight core breast cancer pathways, compared to 5.3% of non-prioritized compounds. Integration of drug-gene interaction data further refined these to eight compounds with supporting experimental or clinical evidence. Together, these results establish a regulatory variant-guided drug repurposing framework that connects noncoding genetic variation to therapeutic candidates and provides a generalizable strategy for translating the noncoding genome into pharmacologically relevant hypotheses.

Version published to 10.64898/2026.05.11.724354 on bioRxiv
May 14, 2026

Condition-matched in silico prediction of drug transcriptional responses enables mechanism-guided screening and combination discovery

This article has 5 authors:
1. Meisheng Xiao
2. Yiping He
3. Jianhua Hu
4. Fei Zou
5. Baiming Zou
This article has no evaluationsLatest version Mar 31, 2026
Modeling gene regulatory perturbations via deep learning from high-throughput reporter assays

This article has 13 authors:
1. Revathy Venukuttan
2. Richard Doty
3. Alexander Thomson
4. Yutian Chen
5. Boyao Li
6. Yuncheng Duan
7. Alejandro Barrera
8. Katherine Dura
9. Kuei-Yueh Ko
10. Hilmar Lapp
11. Timothy E. Reddy
12. Andrew S. Allen
13. William H. Majoros
This article has no evaluationsLatest version Mar 31, 2026
Combinatorial epigenomic patterns define regulatory programs underlying disease heterogeneity

This article has 9 authors:
1. Woo Jun Shim
2. Shaine Chenxin Bao
3. Chris Siu Yeung Chow
4. Dalia Mizikovsky
5. Sophie Shen
6. Zachary Riedlshah
7. Qiongyi Zhao
8. Mikael Boden
9. Nathan J. Palpant
This article has no evaluationsLatest version May 5, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Condition-matched in silico prediction of drug transcriptional responses enables mechanism-guided screening and combination discovery

Modeling gene regulatory perturbations via deep learning from high-throughput reporter assays

Combinatorial epigenomic patterns define regulatory programs underlying disease heterogeneity