A Large-Scale Pharmacogenomic Knowledge Graph for Drug-Gene-Variant-Disease Discovery

Muhammad Omar Faruk

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Precision therapeutics depends on the ability to reason jointly over genes, variants, drugs, diseases, adverse drug reactions (ADRs), and molecular pathways without contaminating evaluation with future knowledge. I present a large-scale pharmacogenomic knowledge graph (PGx-KG) that integrates PharmGKB, ClinVar, SIDER, and Reactome—harmonized to HGNC, RxNorm, MeSH, and ChEBI identifiers—yielding 3,744,727 nodes and 9,645,367 edges across six major relation families. A leakage-free processing pipeline enforces version-aware chronological splits, publication-date audits, symmetric and transitive consistency checks, and cross-database de-duplication, eliminating temporal violations in held-out audits. As a first benchmark, a bilinear link-prediction model implemented in PyTorch Geometric achieves mean reciprocal rank (MRR) 0.347 (95% bootstrap CI [0.321, 0.368]), Hits@1/3/10 of 0.234/0.417/0.589, and AUROC 0.823 on validation data, with five-fold temporal cross-validation yielding 0.341 ± 0.018 MRR and a 2024 hold-out achieving MRR 0.329. Ranked candidate lists surface clinically relevant hypotheses, including CYP2D6–codeine dosing and HLA-B*15:02–carbamazepine risk, while also proposing pathway-level drug repurposing opportunities for expert review.

Version published to 10.1101/2025.09.24.25336269 on medRxiv
Sep 25, 2025

Path-Probability Models Outperform Point-Estimate Scores for Noncoding GWAS Gene Prioritization

This article has 1 author:
1. Abduxoliq Ashuraliyev
This article has no evaluationsLatest version Dec 22, 2025
A Hybrid Pharmacovigilance Method for National-Scale Comorbidity Discovery: Association Rules with FDA-Approved PRR/Chi-square and EBGM Validation.

This article has 1 author:
1. Kaossara Osseni
This article has no evaluationsLatest version Dec 24, 2025
PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs

This article has 11 authors:
1. Jake Cohen-Setton
2. Shruti Shikhare
3. Ioannis Kagiampakis
4. Domingo Salazar
5. Miguel Goncalves
6. Elizabeth Coker
7. Sanddhya Jayabalan
8. Damian Bikiel
9. Ben Sidders
10. Etai Jacob
11. Krishna Bulusu
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Path-Probability Models Outperform Point-Estimate Scores for Noncoding GWAS Gene Prioritization

A Hybrid Pharmacovigilance Method for National-Scale Comorbidity Discovery: Association Rules with FDA-Approved PRR/Chi-square and EBGM Validation.

PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs