Evaluating a Multi-Modal Large Language Model for Ophthalmology Triage

Caius Goh
Jabez Ng
Au Wei Yung
Clarence See
Alva Lim
Fan Xiuyi
Kelvin Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background/Purpose: Ophthalmic triage is challenging for non-specialists due to limited training and rising global eye disease burden. This study evaluates a multimodal frame-work integrating clinical text and ophthalmic imaging with large language models (LLMs). Hallucination detection and chain-of-thought (CoT) reasoning were incorporated to improve diagnostic accuracy. Methods: A dataset of 41 ophthalmology cases from a Singapore restructured hospital was pre-processed with acronym expansion, sentence re-construction, and hallucination detection. To address dataset size limitations, 100 syn-thetic cases were generated via one-shot GPT-4 prompting, validated by semantic checks and ophthalmologist review. Three diagnostic approaches were tested: Text-Only, Im-age-Assisted, and Image with CoT. Diagnostic performance was quantified using SNOMED-CT mapping and a dissimilarity score reflecting semantic distance between predicted and reference diagnoses. Results: The synthetic dataset included anterior seg-ment (n=40), posterior segment (n=35), and extraocular (n=25) cases. The text-only ap-proach yielded a mean dissimilarity of 6.353 +/- 1.685. Incorporation of image assistance reduced this to 5.234 +/- 1.305, while CoT prompting provided further gains when imag-ing cues were ambiguous. Conclusions: The multimodal pipeline improved diagnostic alignment in ophthalmology triage. Image inputs enhanced accuracy, and CoT reasoning reduced errors from ambiguous features, supporting its potential as an accurate tool for ophthalmology triage.

Version published to 10.20944/preprints202509.1349.v1
Sep 18, 2025

DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice

This article has 23 authors:
1. Zuozhu Liu
2. Zijie Meng
3. Jin Hao
4. Xiwei Dai
5. Yang Feng
6. Jiaxiang Liu
7. Bin Feng
8. Huikai Wu
9. Xiaotang Gai
10. Hengchuan Zhu
11. Tianxiang Hu
12. Yangyang Wu
13. Hongxia Xu
14. Jin Li
15. Jun Xiao
16. Xiaoqiang Liu
17. Joey Tianyi Zhou
18. Fudong Zhu
19. Zhihe Zhao
20. Lunguo Xia
21. Bing Fang
22. Jimeng Sun
23. jian wu
This article has no evaluationsLatest version Sep 3, 2025
A Retrospective Analysis of a Dermatology-Trained Multimodal Large Language Model's Diagnostic Accuracy in Pigmented Skin Lesions

This article has 4 authors:
1. Joshua Mijares
2. Neil Jairath
3. Andrew Zhang
4. Syril Que
This article has no evaluationsLatest version Oct 3, 2025
A Real-World Comparison of Three Deep-Learning Systems for Diabetic Retinopathy in Remote Australia

This article has 11 authors:
1. Jocelyn J. Drinkwater
2. Qiang Li
3. Kerry Woods
4. Emma Douglas
5. Mark Chia
6. Yukun Zhou
7. Steve Bartnik
8. Yachana Shah
9. Vaibhav Shah
10. Pearse A. Keane
11. Angus W. Turner
This article has no evaluationsLatest version Oct 13, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice

A Retrospective Analysis of a Dermatology-Trained Multimodal Large Language Model's Diagnostic Accuracy in Pigmented Skin Lesions

A Real-World Comparison of Three Deep-Learning Systems for Diabetic Retinopathy in Remote Australia