Attcatvgg-Net: an Explainable Multioutput Deep Learning Framework for Cataract Stage Classification and Visual Acuity Regression using Multicolor Fundus Images

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The purpose of this study is to develop and evaluate an attention-guided deep learning model using the multicolor imaging module of Spectralis Optical Coherence Tomography (OCT) imaging for automated cataract severity classification and Visual Acuity (VA) prediction.

Methods

We analyzed 314 multicolor fundus images from 169 patients. Images were preprocessed using an enhanced Retinex algorithm and segmented into three concentric macular zones: Zone 1 (fovea, central 1.5 mm diameter), Zone 2 (parafovea, 1.5-2.5 mm ring), and Zone 3 (perifovea, >2.5 mm radius). A multi-output convolutional neural network (AttCatVgg-Net), based on VGG-16 and enhanced with a Convolutional Block Attention Module (CBAM), was trained to simultaneously perform three-class cataract classification (normal to mild, moderate, severe) and visual acuity (VA) regression. Model performance was assessed using accuracy, AUC, F1-score, and regression metrics. Statistical analyses included the Wilcoxon signed-rank test and the Spearman correlation test.

Results

For cataract grading, the integrated model using all wavelengths and zones achieved 92.5% accuracy, 94.7% area under the ROC curve (AUC), and a 92.1% F1-score. The green channel alone achieved 90.1% accuracy and 0.93 AUC, while the red channel yielded lower performance (76.3% accuracy, 0.83 AUC). Among anatomical zones, Zone 1 (fovea) and Zone 3 achieved 84.3% and 84.71% accuracy and 0.88 and 0.89 AUC, respectively, whereas Zone 2 underperformed (60.41% accuracy, 0.71 AUC). For visual acuity prediction, the full model achieved a mean absolute error (MAE) of 0.1181 and a coefficient of determination (R-squared) of 0.7759. The green channel demonstrated the strongest correlation with actual VA (correlation coefficient = 0.823, p < 0.001), followed by green-red (0.817) and blue (0.809). The green channel also achieved the lowest Mean Squared Error (MSE = 0.0369) and Root Mean Square Error (RMSE = 0.1920), outperforming other channels.

Conclusions

Attention-guided deep learning applied to Spectralis OCT multicolor imaging enables accurate, objective classification of cataract severity and estimation of cataract-related visual acuity loss.

Article activity feed