OphthaBERT: Automated Glaucoma Diagnosis from Clinical Notes

Rishi Shah
Mousa Moradi
Sedigheh Eslami
Asahi Fujita
Kanza Aziz
Niloufar Bineshfar
Tobias Elze
Mohammad Eslami
Saber Kazeminasab
Daniel Liebman
Saeid Rasouli
Daniel Vu
Mengyu Wang
Jithin Yohannan
Nazlee Zebardast

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Glaucoma is a leading cause of irreversible blindness worldwide, with early intervention often being crucial. Research into the underpinnings of glaucoma often relies on electronic health records (EHRs) to identify patients with glaucoma and their subtypes. However, current methods for identifying glaucoma patients from EHRs are often inaccurate or infeasible at scale, relying on International Classification of Diseases (ICD) codes or manual chart reviews. To address this limitation, we introduce (1) OphthaBERT, a powerful general clinical ophthalmology language model trained on over 2 million diverse clinical notes, and (2) a fine-tuned variant of OphthaBERT that automatically extracts binary and subtype glaucoma diagnoses from clinical notes. The base OphthaBERT model is a robust encoder, outperforming state-of-the-art clinical encoders in masked token prediction on out-of-distribution ophthalmology clinical notes and binary glaucoma classification with limited data. We report significant binary classification performance improvements in low-data regimes (p < 0.001, Bonferroni corrected). OphthaBERT is also able to achieve superior classification performance for both binary and subtype diagnosis, outperforming even fine-tuned large decoder-only language models at a fraction of the computational cost. We demonstrate a 0.23-point increase in macro-F1 for subtype diagnosis over ICD codes and strong binary classification performance when externally validated at Wilmer Eye Institute. OphthaBERT provides an interpretable, equitable framework for general ophthalmology language modeling and automated glaucoma diagnosis.

Version published to 10.1101/2025.06.08.25329151 on medRxiv
Jun 9, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed