Multimodal Deep Learning for Longitudinal Prediction of Glaucoma Progression Using Sequential RNFL, Visual Field, and Clinical Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Forecasting glaucoma progression remains a major challenge in preventing irreversible vision loss. We developed and validated a multimodal, longitudinal deep learning framework to predict future progression using a large retrospective cohort of 10,864 patients from Mass Eye and Ear. The model integrates sequential structural (OCT RNFL scans), functional (visual-field maps), and clinical data from a two-year observation window to forecast progression over the subsequent two-to four-year horizon. Four backbone architectures (ConvNeXt-V2, ViT, MobileNet-V2, EfficientNet-B0) were coupled with a bidirectional LSTM to capture temporal dynamics. The ConvNeXt-V2-based model achieved 0.97 AUC and 0.94–0.96 accuracy, outperforming other backbones with robust performance across sex and race subgroups and only modest attenuation in those > 70 years. Saliency maps localized to clinically relevant arcuate bundles, supporting biological plausibility. By effectively fusing multimodal data over time, this framework enables accurate, interpretable, and equitable long-horizon risk stratification, advancing personalized glaucoma management.

Article activity feed