PRSNet-2: End-to-end genotype-to-phenotype prediction via hierarchical graph neural networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The emergence of large-scale biobanks has opened unprecedented opportunities for the development of data-driven approaches, especially deep learning-based methods, for genotype-to-phenotype (G2P) prediction. However, designing an end-to-end framework capable of directly leveraging extremely high-dimensional genotypic data while simultaneously mitigating overfitting and ensuring robust prediction remains a significant challenge. In this study, we introduce PRSNet-2, an interpretable end-to-end deep learning framework designed to predict complex phenotypes directly from large-scale genotypic data. PRSNet-2 proposes a novel hierarchical graph neural network (GNN) architecture, which first employs a multi-kernel aggregator to map high-dimensional genotypic features to gene-level representations. Next, it models gene-gene interactions through message-passing operations and uses an attention-based readout module to generate interpretable phenotypic predictions. We further introduce significance-guided regularization strategies to boost model’s generalizability based on prior genetic associations. Extensive empirical evaluations across multiple complex traits and diseases demonstrate that PRSNet-2 consistently outperforms a variety of baseline methods and exhibits superior capabilities in overcoming overfitting, even when modeling around half a million single nucleotide polymorphisms (SNPs). Moreover, PRSNet-2 is shown to be easily extendable to integrate multiple genome-wide association study (GWAS) datasets into a single model, thereby enhancing predictive performance for both single-phenotype and multi-phenotype prediction tasks. The inherent interpretability of PRSNet-2 further facilitates the identification of disease-relevant genes, functional gene modules, and potential therapeutic targets. In summary, PRSNet-2 offers a powerful, versatile tool for both genetic risk stratification and the discovery of biological insights.