Multi-Stage Graph Attention Networks for Interpretable Alzheimer’s Disease Classification from Genome-Wide Association Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Genome-wide association studies have identified numerous variants associated with neuropsychiatric disorders. Although some significant loci can carry substantial risk, as in Alzheimer’s Disease, the remaining genetic variance is distributed across many small-effect loci. Polygenic risk scores (PRS) aggregate this risk but do not capture epistatic interactions, and offer limited biological interpretability and predictive accuracy. Computing gene level risk scores and integrating known or statistically validated gene-gene associations has the potential to increase interpretability and/or accuracy. Graph Neural Networks (GNNs) can leverage graph structured genetic data that models potential epistatic interactions to achieve these goals.

Methods

We developed a three-stage Graph Attention Network (GAT) classifier using individual-level GWAS data from 7,358 participants across seven Alzheimer’s Disease Center cohorts. Nodes were defined as genes, with risk scores from AD and 11 genetically correlated phenotypes serving as features. We evaluated two graph construction strategies: gene co-expression networks derived from hippocampal transcriptomic data and curated pathway-based graphs. Additionally, a bilinear context module was incorporated to capture global gene-gene interactions beyond the graph topology. In Stage 1, a GNN encoder was trained on the graphs; Stage 2 injected PRS for non-coding SNPs after the encoder to better capture genetic risk via transfer learning, and Stage 3 applied adversarial training with gradient reversal for ancestry debiasing. GNN predictions were ensembled with whole-genome PRS using elastic net regression.

Results

The best-performing GNN model — a GAT with bilinear context operating on the pathway graph — achieved an AUROC of 0.78 (95% CI: 0.75–0.80). Ensemble models combining Stage 2 or 3 GNN logits with whole-genome PRS achieved an AUROC of 0.82 (0.79–0.84), outperforming PRS alone (0.80). GxI attribution and additional explainability analyses revealed stage-specific biological signals, some of which re-capitulated known gene-phenotype associations and others which may reflect potential new areas of inquiry.

Conclusion

A multi-stage GAT framework captures complementary, non-additive genetic signal that, when ensembled with PRS, improves the accuracy of AD classification. Post-hoc explainability analyses yield biologically interpretable gene networks, supporting the utility of graph-based deep learning for dissecting complex genetic architectures.

Article activity feed