Comparison of Foundation and Supervised Learning-Based Models for Detection of Referable Glaucoma from Fundus Photographs

Kyle Bolo
Tran Huy Nguyen
Sreenidhi Iyengar
Zhiwei Li
Van Nguyen
Brandon J. Wong
Jiun L. Do
Jose-Luis Ambite
Carl Kesselman
Lauren P. Daskivich
Benjamin Y. Xu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose

To compare the performance of a foundation model and a supervised learning-based model for detecting referable glaucoma from fundus photographs.

Design

Evaluation of diagnostic technology.

Participants

6,116 participants from the Los Angeles County Department of Health Services Teleretinal Screening Program.

Methods

Fundus photographs were labeled for referable glaucoma (cup-to-disc ratio ≥ 0.6) by certified optometrists. Four deep learning models were trained on cropped and uncropped images (Training N = 8,996; Validation N = 3,002) using two architectures: a vision transformer with self-supervised pretraining on fundus photographs (RETFound) and a convolutional neural network (VGG-19). Models were evaluated on a held-out test set (N = 1,000) labeled by glaucoma specialists and an external test set (N = 300) from University of Southern California clinics. Performance was assessed while varying training set size and stratifying by demographic factors. xRAI was used for saliency mapping.

Main Outcome Measures

Area under the receiver operating characteristic curve (AUC-ROC) and threshold-specific metrics.

Results

The cropped image VGG-19 model achieved the highest AUC-ROC (0.924 [0.907-0.940]), which was comparable ( p = 0.07) to the cropped image RETFound model (0.911 [0.892-0.930]), which achieved the highest Youden-optimal performance (sensitivity 82.6%, specificity 88.2%) and F1 score (0.801). Cropped image models outperformed their uncropped counterparts within each architecture ( p < 0.001 for AUC-ROC comparisons). RETFound models had a performance advantage when trained on smaller datasets (N < 2000 images), and the uncropped image RETFound model performed best on external data ( p < 0.001 for AUC-ROC comparisons). The cropped image RETFound model performed consistently across ethnic groups ( p = 0.20), while the others did not ( p < 0.04); performance did not vary by age or gender. Saliency maps for both architectures consistently included the optic nerve.

Conclusion

While both RETFound and VGG-19 models performed well for classification of referable glaucoma, foundation models may be preferable when training data is limited and when domain shift is expected. Training models using images cropped to the region of the optic nerve improves performance regardless of architecture but may reduce model generalizability.

Version published to 10.1101/2025.08.21.25334170 on medRxiv
Aug 24, 2025

A Vision Transformer Model for the Detection of Glaucoma from Optic Disc Photographs

This article has 9 authors:
1. Ella Bouris
2. Brayden K. Leyva
3. Ojo Perpetua Odugbo
4. Jericho Lawson
5. Sang Wook Jin
6. Zhe Fei
7. Esteban Morales
8. Omar Alkhalili
9. Joseph Caprioli
This article has no evaluationsLatest version Sep 8, 2025
DINO-EYE: Self-Supervised Learning for Identification of Different Optic Disc Phenotypes in Primary Open Angle Glaucoma

This article has 4 authors:
1. Lourdes Grassi
2. Zhe Fei
3. Esteban Morales
4. Joseph Caprioli
This article has no evaluationsLatest version Aug 21, 2025
Optimizing Field-of-View for Type-1 Retinopathy of Prematurity via Multiple Instance Learning

This article has 3 authors:
1. Wen-Hsi Lan
2. Chia-Ling Tsai
3. Wei-Chi Wu
This article has no evaluationsLatest version Sep 3, 2025

Discuss this preprint

Listed in

Abstract

Purpose

Design

Participants

Methods

Main Outcome Measures

Results

Conclusion

Article activity feed

Related articles

A Vision Transformer Model for the Detection of Glaucoma from Optic Disc Photographs

DINO-EYE: Self-Supervised Learning for Identification of Different Optic Disc Phenotypes in Primary Open Angle Glaucoma

Optimizing Field-of-View for Type-1 Retinopathy of Prematurity via Multiple Instance Learning