Reframing AI for Rare Disease Recognition

Wei-Qi Wei
Chao Yan
Wu-Chen Su
Yi Xin
Monika Grabowska
Vern Kerchberger
Victor Borza
Jinlian Wang
Liwei Wang
Rui Li
Jacob Lynn
Alyson Dickson
Cathy Shyr
QiPing Feng
C Stein
Kai Wang
Peter Embí
Bradley Malin
Hongfang Liu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Rare diseases affect over 300 million people worldwide, yet patients often endure years-long diagnostic delays that limit timely intervention and trial opportunities. Computational rare disease recognition (RDR) remains constrained by knowledge resources that are often incomplete, heterogeneous, and dependent on extensive multi-disciplinary expert curation that cannot scale. Large language models (LLMs) applied directly for end-to-end diagnosis or disease discrimination face similar knowledge bottlenecks while also raising concerns around cost, reproducibility, and data governance. Here, we introduce GEN-KnowRD, a knowledge-layer-first framework that leverages LLMs to generate schema-guided rare disease profiles, systematically assesses their quality, and constructs a computable knowledge base (PheMAP-RD) for local deployment. GEN-KnowRD integrates this knowledge into lightweight inference pipelines for both general-purpose disease screening and specialized early discrimination from longitudinal electronic health records. In tests using six public benchmarks for general-purpose screening (9,290 patients spanning 798 rare diseases), GEN-KnowRD substantially improved disease ranking versus 1) a state-of-the-art, HPO-centered diagnostic framework (up to 345.8% improvement in top-1 success), 2) advanced end-to-end LLM reasoning (up to 129.1% improvement), and 3) a variant of GEN-KnowRD instantiated with expert-curated knowledge rather than LLM-generated profiles. In two real-world cohorts for early diagnosis of idiopathic pulmonary fibrosis (511 patients) as a use case, GEN-KnowRD also demonstrated robust discrimination performance gains, supporting effective RDR during the pre-diagnostic window. These findings demonstrate that repositioning LLMs from diagnostic reasoning to the knowledge layer—decoupling knowledge construction from patient-level inference—yields stronger RDR, while providing scalable, continuously updatable, and reusable infrastructure for diagnosis, screening, and clinical research across the rare disease landscape.

Version published to 10.21203/rs.3.rs-9036259/v1 on Research Square
Apr 2, 2026

AI in Variant Analysis: Fast Track to Genetic Diagnoses

This article has 6 authors:
1. Elizabeth J. Wilk
2. Sasha Taluri
3. Timothy C. Howton
4. Anthony B. Crumley
5. Michal Mrug
6. Brittany N. Lasseigne
This article has no evaluationsLatest version Apr 10, 2026
Generalist Foundation Models Are Not Clinical Enough for Hospital Operations

This article has 19 authors:
1. Eric Oermann
2. Lavender Jiang
3. Angelica Chen
4. Xu Han
5. Chris Liu
6. Radhika Dua
7. Kevin Eaton
8. Frederick Wolff
9. Robert Steele
10. Jeff Zhang
11. Anton Alyakin
12. Qingkai Pan
13. Yanbing Chen
14. Karl Sangwon
15. Daniel Alber
16. Jaden Stryker
17. Jin Lee
18. Yindalon Aphinyanaphongs
19. Kyunghyun Cho
This article has no evaluationsLatest version Mar 19, 2026
DLNDD: An Explainable Deep Learning Framework for the Early Detection and Classification of Rare Diseases

This article has 1 author:
1. Mian Muhammad Hamza
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AI in Variant Analysis: Fast Track to Genetic Diagnoses

Generalist Foundation Models Are Not Clinical Enough for Hospital Operations

DLNDD: An Explainable Deep Learning Framework for the Early Detection and Classification of Rare Diseases