Evaluating a Multi-Modal Large Language Model for Ophthalmology Triage
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background/Purpose: Ophthalmic triage is challenging for non-specialists due to limited training and rising global eye disease burden. This study evaluates a multimodal frame-work integrating clinical text and ophthalmic imaging with large language models (LLMs). Hallucination detection and chain-of-thought (CoT) reasoning were incorporated to improve diagnostic accuracy. Methods: A dataset of 41 ophthalmology cases from a Singapore restructured hospital was pre-processed with acronym expansion, sentence re-construction, and hallucination detection. To address dataset size limitations, 100 syn-thetic cases were generated via one-shot GPT-4 prompting, validated by semantic checks and ophthalmologist review. Three diagnostic approaches were tested: Text-Only, Im-age-Assisted, and Image with CoT. Diagnostic performance was quantified using SNOMED-CT mapping and a dissimilarity score reflecting semantic distance between predicted and reference diagnoses. Results: The synthetic dataset included anterior seg-ment (n=40), posterior segment (n=35), and extraocular (n=25) cases. The text-only ap-proach yielded a mean dissimilarity of 6.353 +/- 1.685. Incorporation of image assistance reduced this to 5.234 +/- 1.305, while CoT prompting provided further gains when imag-ing cues were ambiguous. Conclusions: The multimodal pipeline improved diagnostic alignment in ophthalmology triage. Image inputs enhanced accuracy, and CoT reasoning reduced errors from ambiguous features, supporting its potential as an accurate tool for ophthalmology triage.