DeepSeek as the paradigm shift in rare disease diagnosis – the power of a fully automated genetic variant classification system

Wei Ma
Grace Fong
Joe Lai
Heidi Wu
Shirley Pik Ying Hue
Jonson Ying
The Hong Kong Genome Project
Annie Tsz Wai Chu
Brian Hon Yin Chung

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) have been extensively tested for incorporating into medical applications in recent years, yet their potential in clinical genetics, particularly in diagnosing rare diseases, remains underexplored. Recent advancements in LLMs have improved their reasoning capabilities and transparency, facilitating significant enhancements in clinical workflow designs. The open-sourced DeepSeek model also serves as a cost-effective alternative of top-ranked proprietary reasoning LLMs such as o3-mini-high for genome projects and hospitals that have specific needs in data security. In this study, we developed a framework that fully automates genetic variant classification according to the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) guidelines and Clinical Genome Resource (ClinGen) recommendations. Two state-of-the art LLMs, DeepkSeek-R1 and o3-mini-high were tested for their performance in variant classification. We demonstrated that through careful prompt engineering and creation of ACMG-rule specific knowledgebases, DeepSeek-R1 outperformed o3-mini-high and achieved high sensitivity and 100% specificity in interpreting ACMG rules that require understanding literature-based evidence. Further testing using 150 variants curated by ClinGen experts, DeepSeek-R1 demonstrated performance on par with human curators. Finally, we showed the framework can be also used for reanalysis using 150 ClinVar variants with conflicting interpretations. Our study provided the first LLM framework capable of fully automated variant classification in the diagnosis of genetic diseases and variant reanalysis.

Version published to 10.1101/2025.06.03.25328923 on medRxiv
Jun 4, 2025

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026
VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

This article has 6 authors:
1. Jiawei Wu
2. Marissa Stutzman
3. Michael Muriello
4. Joy Lincoln
5. Donald G. Basel
6. Xiaowu Gai
This article has no evaluationsLatest version Jan 21, 2026
Large Language Models Enhance Molecular Diagnoses of Mendelian Disorders via A Novel Logic

This article has 15 authors:
1. Zefu Chen
2. Jihao Cai
3. Yongxin Yang
4. Sen Zhao
5. Guozhuang Li
6. Kexin Xu
7. Qing Li
8. Timothy Hospedales
9. Lina Zhao
10. Zhongmin Zhang
11. Zhihong Wu
12. Guixing Qiu
13. Terry Jianguo Zhang
14. Pengfei Liu
15. Nan Wu
This article has no evaluationsLatest version Dec 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

Large Language Models Enhance Molecular Diagnoses of Mendelian Disorders via A Novel Logic