DeepSeek as the paradigm shift in rare disease diagnosis – the power of a fully automated genetic variant classification system
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) have been extensively tested for incorporating into medical applications in recent years, yet their potential in clinical genetics, particularly in diagnosing rare diseases, remains underexplored. Recent advancements in LLMs have improved their reasoning capabilities and transparency, facilitating significant enhancements in clinical workflow designs. The open-sourced DeepSeek model also serves as a cost-effective alternative of top-ranked proprietary reasoning LLMs such as o3-mini-high for genome projects and hospitals that have specific needs in data security. In this study, we developed a framework that fully automates genetic variant classification according to the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) guidelines and Clinical Genome Resource (ClinGen) recommendations. Two state-of-the art LLMs, DeepkSeek-R1 and o3-mini-high were tested for their performance in variant classification. We demonstrated that through careful prompt engineering and creation of ACMG-rule specific knowledgebases, DeepSeek-R1 outperformed o3-mini-high and achieved high sensitivity and 100% specificity in interpreting ACMG rules that require understanding literature-based evidence. Further testing using 150 variants curated by ClinGen experts, DeepSeek-R1 demonstrated performance on par with human curators. Finally, we showed the framework can be also used for reanalysis using 150 ClinVar variants with conflicting interpretations. Our study provided the first LLM framework capable of fully automated variant classification in the diagnosis of genetic diseases and variant reanalysis.