An Interpretable Sparse Graph Contrastive Learning Approach for Identifying Breast Cancer Risk Variants

Gudhe Naga Raju
Jaana M. Hartikainen
Maria Tengström
Katri Pylkäs
Robert Winqvist
Veli-Matti Kosma
Hamid Behravan
Arto Mannermaa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genome-wide association studies (GWASs) have identified over 2,400 genetic variants associated to breast cancer. Conventional GWASs methods that analyze variants independently often overlook the complex genetic interactions underlying disease susceptibility. Machine and deep learning approaches present promising alternatives, yet encounter challenges, including overfitting due to high dimensionality (∼10 million variants) and limited sample sizes, as well as limited interpretability. Here, we present GenoGraph, a graph-based contrastive learning framework designed to address these limitations by modeling high-dimensional genetic data in low-sample-size scenarios. We demonstrate GenoGraph’s efficacy in breast cancer case-control classification task, achieving accuracy of 0.96 using the Biobank of Eastern Finland dataset. GenoGraph identified rs11672773 ( ZNF8 ) as a key risk variant in Finnish population, with significant interactions with rs10759243 ( KLF4 ) and rs3803662 ( TOX3 ). Furthermore, in silico validation confirmed the biological relevance of these findings, underscoring GenoGraph’s potential to advance breast cancer risk prediction and elucidate genetic interactions for personalized medicine.

Version published to 10.1101/2025.01.13.25320451 on medRxiv
Jan 15, 2025

PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs

This article has 11 authors:
1. Jake Cohen-Setton
2. Shruti Shikhare
3. Ioannis Kagiampakis
4. Domingo Salazar
5. Miguel Goncalves
6. Elizabeth Coker
7. Sanddhya Jayabalan
8. Damian Bikiel
9. Ben Sidders
10. Etai Jacob
11. Krishna Bulusu
This article has no evaluationsLatest version Dec 15, 2025
Cross-Platform Reproducible Modeling of Breast Cancer Prognosis Using the Core-PAM50 Gene Signature

This article has 2 authors:
1. Rafael de Negreiros Botan
2. Joao Batista de Sousa
This article has no evaluationsLatest version Dec 19, 2025
Predicting gene expression from whole slide images in prostate cancer using deep learning

This article has 14 authors:
1. Anxuan Han
2. Bo Li
3. Chui Yan Mah
4. Jessica Logan
5. Yanan Wang
6. Ning Liu
7. Feargal Ryan
8. David Lynn
9. Darren Foreman
10. John O’Leary
11. Douglas Brooks
12. Jose Polo
13. Lisa Butler
14. Fuyi Li
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs

Cross-Platform Reproducible Modeling of Breast Cancer Prognosis Using the Core-PAM50 Gene Signature

Predicting gene expression from whole slide images in prostate cancer using deep learning