aiDIVA – Diagnostics of Rare Genetic Diseases Using Large Language Models

Dominic Boceck
Lucia Laugwitz
Marc Sturm
Daniela Bezdan
Axel Gschwind
Tobias B. Haack
Stephan Ossowski

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genome sequencing (GS) enables the accurate identification of genetic variants in most genomic regions and is rapidly transforming routine diagnostics for rare diseases (RD). While streamlined data generation is scalable, efficient prioritization and correct clinical interpretation of detected alterations remain a challenge, often requiring manual classification by experts with years of training. Hence, there is a need for AI-driven clinical decision support systems that assist clinical experts in identifying causal variants or, in case of large-scale re-analysis of unsolved cases, fully automate the process. To this end, many tools have been developed to estimate the impact of variants on protein function. However, only a small number of tools combine genomic data, variant annotations, and phenotypic data to diagnose cases.

Here we introduce aiDIVA, an ensemble-AI featuring a hierarchically organized set of statistical and machine learning models trained on genomic and phenotypic data to identify the causal variant(s) among tens of thousands of genetic variants of a patient. aiDIVA generates pathogenicity classifications for each variant using a random forest AI model and an evidence-based score for dominant and recessive diseases. It combines these predictions with additional clinical metadata to prioritize and rank the most likely causal variants. aiDIVA uses large language models (LLMs) to further improve and explain the results. Finally, the aiDIVA-meta model combines all scores to generate a ranked list of variants. In a benchmark analysis on more than 3,000 diagnostically solved RD patients, the causal variant was included in 97% of the cases among the top-3 candidate variants reported by aiDIVA-meta. Unlike comparative methods, aiDIVA provides interpretable explanations for the best candidates.

Version published to 10.1101/2025.09.04.25335099 on medRxiv
Sep 7, 2025

Large Language Models Enhance Molecular Diagnoses of Mendelian Disorders via A Novel Logic

This article has 15 authors:
1. Zefu Chen
2. Jihao Cai
3. Yongxin Yang
4. Sen Zhao
5. Guozhuang Li
6. Kexin Xu
7. Qing Li
8. Timothy Hospedales
9. Lina Zhao
10. Zhongmin Zhang
11. Zhihong Wu
12. Guixing Qiu
13. Terry Jianguo Zhang
14. Pengfei Liu
15. Nan Wu
This article has no evaluationsLatest version Dec 22, 2025
Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

This article has 15 authors:
1. Sarah Silverstein
2. Kaushik Ganapathy
3. Sandra Donkervoort
4. Veronique Bolduc
5. Ying Hu
6. Justin Moy
7. Prech Uapinyoying
8. Svetlana Gorokhova
9. Vijay Ganesh
10. Ben Weisburd
11. Rotem OrBach
12. A. Reghan Foley
13. Pejman Mohammadi
14. David Adams
15. Carsten Bonnemann
This article has no evaluationsLatest version Jan 29, 2026
VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

This article has 6 authors:
1. Jiawei Wu
2. Marissa Stutzman
3. Michael Muriello
4. Joy Lincoln
5. Donald G. Basel
6. Xiaowu Gai
This article has no evaluationsLatest version Jan 21, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Large Language Models Enhance Molecular Diagnoses of Mendelian Disorders via A Novel Logic

Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants