PRSNet-2: End-to-end genotype-to-phenotype prediction via hierarchical graph neural networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The emergence of large-scale biobanks has opened unprecedented opportunities for the development of data-driven approaches, especially deep learning-based methods, for genotype-to-phenotype (G2P) prediction. However, designing an end-to-end framework capable of directly leveraging extremely high-dimensional genotypic data while simultaneously mitigating overfitting and ensuring robust prediction remains a significant challenge. In this study, we introduce PRSNet-2, an interpretable end-to-end deep learning framework designed to predict complex phenotypes directly from large-scale genotypic data. PRSNet-2 proposes a novel hierarchical graph neural network (GNN) architecture, which first employs a multi-kernel aggregator to map high-dimensional genotypic features to gene-level representations. Next, it models gene-gene interactions through message-passing operations and uses an attention-based readout module to generate interpretable phenotypic predictions. We further introduce significance-guided regularization strategies to boost model’s generalizability based on prior genetic associations. Extensive empirical evaluations across multiple complex traits and diseases demonstrate that PRSNet-2 consistently outperforms a variety of baseline methods and exhibits superior capabilities in overcoming overfitting, even when modeling around half a million single nucleotide polymorphisms (SNPs). Moreover, PRSNet-2 is shown to be easily extendable to integrate multiple genome-wide association study (GWAS) datasets into a single model, thereby enhancing predictive performance for both single-phenotype and multi-phenotype prediction tasks. The inherent interpretability of PRSNet-2 further facilitates the identification of disease-relevant genes, functional gene modules, and potential therapeutic targets. In summary, PRSNet-2 offers a powerful, versatile tool for both genetic risk stratification and the discovery of biological insights.

Article activity feed