Drug repurposing for rare diseases via a gene-bridged heterogeneous knowledge graph and graph attention network

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Rare diseases are severely underserved by pharmacological treatments, and computational drug repurposing offers a cost-effective alternative to de novo discovery. We present a reproducible end-to-end pipeline integrating 3,961 rare disease–gene associations from Orphadata with 98,239 gene–drug records from DisGeNET through a multi-stage harmonization pipeline (HGNC symbol standardization and RapidFuzz fuzzy matching), yielding a large-scale gene-bridged rare disease tripartite knowledge graph — to our knowledge the largest such graph constructed exclusively from Orphadata and DisGeNET— comprising 15,454 nodes and 35,131 edges spanning 2,249 clinically distinct rare diseases. A Graph Attention Network (GAT) trained on node-type classification as a pretext task achieves macro F1 = 0.651 and ROC-AUC = 0.818 on a stratified held-out test set, with stable performance across five evaluation partitions (SD ≤ 0.007). Drug candidate retrieval via cosine similarity in the GAT embedding space achieves Hits@10 = 0.400 across 200 evaluated disorders (vs. < 0.001 random baseline), with the clinically validated drug NITISINONE recovered at rank 4 for a tyrosine catabolism pathway disorder without pathway annotations. A deployment-ready interface is publicly available on HuggingFace Spaces.

Article activity feed