An Interpretable Sparse Graph Contrastive Learning Approach for Identifying Breast Cancer Risk Variants
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome-wide association studies (GWASs) have identified over 2,400 genetic variants associated to breast cancer. Conventional GWASs methods that analyze variants independently often overlook the complex genetic interactions underlying disease susceptibility. Machine and deep learning approaches present promising alternatives, yet encounter challenges, including overfitting due to high dimensionality (∼10 million variants) and limited sample sizes, as well as limited interpretability. Here, we present GenoGraph, a graph-based contrastive learning framework designed to address these limitations by modeling high-dimensional genetic data in low-sample-size scenarios. We demonstrate GenoGraph’s efficacy in breast cancer case-control classification task, achieving accuracy of 0.96 using the Biobank of Eastern Finland dataset. GenoGraph identified rs11672773 ( ZNF8 ) as a key risk variant in Finnish population, with significant interactions with rs10759243 ( KLF4 ) and rs3803662 ( TOX3 ). Furthermore, in silico validation confirmed the biological relevance of these findings, underscoring GenoGraph’s potential to advance breast cancer risk prediction and elucidate genetic interactions for personalized medicine.