Clinical Trial and Ontology-Derived Positive and Negative Benchmark Datasets for Drug Repurposing Across Rare Diseases
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Evaluating the potential applications of a medicine is a fundamental challenge in drug development. There is a lack of standardized, decision-oriented benchmarks that test whether computational models can generalize therapeutic hypotheses across diseases in ways that reflect real-world pharmaceutical investment decision making. To address this gap, we introduce two complementary resources: the Indication Expansion Investment Decision Network (IxIDN) and the Orphanet Rare Disease Ontology Negative-network (ORDON). IxIDN is a clinical-trial-derived positive benchmark constructed by projecting drug–disease associations from pharmaceutical clinical trials into a disease–disease network; each edge connects disease pairs that have entered clinical trials for the same drug, thereby capturing cases when concrete indication-expansion decisions have been made. The current release contains 574 rare diseases and 5,336 edges. In contrast, ORDON serves as a stringent, biology-aware negative benchmark derived from the authoritative Orphanet Rare Disease Ontology. It identifies maximally distant disease pairs according to curated hierarchical structure and genetics-linked inheritance patterns, providing 793 rare diseases and 5,000 edges that represent high-separation negative candidates across therapeutic areas. Together, IxIDN and ORDON enable rigorous cross-evidence generalization from clinical trials to disease ontology, testing for Disease–Disease Association Learning (DDAL), a core task for mechanism-centered drug repurposing and indication expansion. All data are publicly available with detailed metadata, enabling reproducible evaluation of models on transparent, decision-relevant benchmarks.