Evaluating a Multi-Modal Large Language Model for Ophthalmology Triage

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background/Purpose: Ophthalmic triage is challenging for non-specialists due to limited training and rising global eye disease burden. This study evaluates a multimodal framework integrating clinical text and ophthalmic imaging with large language models (LLMs). Textual consistency filtering and chain-of-thought (CoT) reasoning were incorporated to improve diagnostic accuracy. Methods: A dataset of 56 ophthalmology cases from a Singapore restructured hospital was pre-processed with acronym expansion, sentence reconstruction, and textual consistency filtering. To address dataset size limitations, 100 synthetic cases were generated via one-shot GPT-4 prompting, validated by semantic checks and ophthalmologist review. Three diagnostic approaches were tested: Text-Only, Image-Assisted, and Image with CoT. Diagnostic performance was quantified using a novel SNOMED-CT-based dissimilarity score, defined as the shortest path distance between predicted and reference diagnoses in the ontology, which was used to quantify semantic alignment. Results: The synthetic dataset included anterior segment (n = 40), posterior segment (n = 35), and extraocular (n = 25) cases. The text-only approach yielded a mean dissimilarity of 6.353 (95% CI: 4.668, 8.038). Incorporation of image assistance reduced this to 5.234 (95% CI: 3.930, 6.540), while CoT prompting provided further gains when imaging cues were ambiguous. Conclusions: The multimodal pipeline showed potential in improving diagnostic alignment in ophthalmology triage. Image inputs enhanced accuracy, and CoT reasoning reduced errors from ambiguous features, supporting its feasibility as a pilot framework for ophthalmology triage.

Article activity feed