How to improve polygenic prediction from whole-genome sequencing data by leveraging predicted epigenomic features?

Wanwen Zeng
Hanmin Guo
Qiao Liu
Wing Hung Wong

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Polygenic risk scores (PRS) are crucial in genetics for predicting individual susceptibility to complex diseases by aggregating the effects of numerous genetic variants. Whole-genome sequencing (WGS) has revolutionized our ability to detect rare and even de novo variants, creating an exciting opportunity for developing new PRS methods that can effectively leverage rare variants and capture the complex relationships among different variants. Furthermore, regulatory mechanisms play a crucial role in gene expression and disease manifestation, offering avenues to further enhance the performance and interpretation of PRS predictions. Through simulation studies, we highlighted aspects where current PRS methods face challenges when applied to WGS data, aiming to shed light on potential opportunities for further improvement. To address these challenges, we developed Epi-PRS, an approach that leverages the power of genomic large language models (LLM) to impute epigenomic signals across diverse cellular contexts, for use as intermediate variables between genotype and phenotype. A pretrained LLM is employed to transform genotypes into epigenomic signals using personal diploid sequences as inputs, and the genetic risk is then estimated based on the imputed personal epigenomic signals. Epi-PRS enhances the assessment of personal variant impacts, enabling a comprehensive and holistic consideration of genotypic and regulatory information within large genomic regions. Our simulation results demonstrated that incorporating the nuanced effects of non-linear models, rare variants, and regulatory information can provide more precise PRS prediction and better understanding of genetic risk. Applying Epi-PRS to real data from the UK Biobank, our results further showed that Epi-PRS significantly outperforms existing PRS methods in two major diseases: breast cancer and diabetes. This study suggests that PRS methods can benefit from incorporating non-linear models, rare variants, and regulatory information, highlighting the potential for significant advancements in disease risk modeling and enhancing the understanding of precision medicine.

Significance Statement

Epi-PRS improves polygenic risk scoring by integrating genomic large language models (LLMs) to impute epigenomic signals as intermediaries between genotype and phenotype. This approach enables a more comprehensive assessment of personal variant impacts by incorporating non-linear models, rare variants, and regulatory mechanisms. By leveraging the power of genomic LLM trained on massive amount of reference epigenomics data, Epi-PRS has demonstrated superior performance over existing PRS methods in predicting genetic risk for breast cancer and diabetes in UK Biobank data. These results highlight the potential of Epi-PRS to improve disease risk modeling and advance the field of precision medicine.

Version published to 10.1101/2024.10.04.24314860 on medRxiv
Oct 6, 2024

Decoding Complex Genotype-Phenotype Interactions by Discretizing the Genome

This article has 6 authors:
1. Jędrzej Kubica
2. Hetvi Jethwani
3. Krzysztof H. Banecki
4. Mauricio Moldes
5. Dariusz Plewczynski
6. Ben Busby
This article has no evaluationsLatest version Dec 17, 2025
Causal splicing variants revealed by deep-learning integration of single-cell sQTL mapping under influenza infection

This article has 8 authors:
1. Liuyang Wang
2. Guinevere Connelly
3. Trisha Dalapati
4. Angela Jones
5. Benjamin Schott
6. Joseph Trimarco
7. Nicholas Heaton
8. Dennis Ko
This article has no evaluationsLatest version Jan 6, 2026
Evidence-based genetic variants to gene mapping and prioritization uncovers distinct molecular pathophysiology and therapeutic landscape in polycystic ovary syndrome patients of different ethnicities.

This article has 2 authors:
1. Debojyoti De
2. Sindhuja Rajavelu
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Significance Statement

Article activity feed

Related articles

Decoding Complex Genotype-Phenotype Interactions by Discretizing the Genome

Causal splicing variants revealed by deep-learning integration of single-cell sQTL mapping under influenza infection

Evidence-based genetic variants to gene mapping and prioritization uncovers distinct molecular pathophysiology and therapeutic landscape in polycystic ovary syndrome patients of different ethnicities.