DeepSeek Outperforms GPT-4o in Multispecialty Ophthalmic Diagnosis: A Blinded Expert Evaluation of 33 Complex Cases

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose: To compare the diagnostic and treatment performance of DeepSeek (DS) and GPT-4o large language models (LLMs) in ophthalmology using standardized residency examination cases. Design: Cross-sectional comparative study. Participants: Thirty-three representative cases drawn from the Chinese Ophthalmology Residency Examination Database, covering 8 subspecialties. Methods: Each case was processed by DS and GPT-4o with identical prompts to act as senior ophthalmologists.Three independent ophthalmologists conducted double-blind evaluations of each model’s outputs. Accuracy was scored on a 10-point Likert scale and completeness on a 6-point Likert scale for diagnosis, differential diagnosis, and treatment. Mean scores were compared using paired statistical tests and two-way ANOVA. Main Outcome Measures: Accuracy and completeness scores across diagnostic, differential diagnostic, and treatment tasks. Results: Across all cases, DS achieved significantly higher accuracy for diagnosis (8.04 vs 6.46, P  < 0.0001), differential diagnosis (7.52 vs 5.50, P  < 0.0001), and treatment (7.62 vs 6.65, P  = 0.002) compared with GPT-4o. Completeness scores were also superior for DS in diagnosis (4.86 vs 3.69, P  < 0.0001), differential diagnosis (4.44 vs 3.24, P  < 0.0001), and treatment (4.61 vs 3.90, P  = 0.0001). Subspecialty analyses revealed the largest advantage for DS in retinal diseases, glaucoma, strabismus & amblyopia, and optic nerve disorders. Conclusions: In standardized ophthalmology case evaluations, DS outperformed GPT-4o in both accuracy and completeness, particularly in subspecialties requiring complex reasoning. These findings support the potential role of domain-optimized LLMs as adjuncts in ophthalmic education and clinical decision support, with further research warranted in multimodal and real-world clinical settings.

Article activity feed