Attcatvgg-Net: an Explainable Multioutput Deep Learning Framework for Cataract Stage Classification and Visual Acuity Regression using Multicolor Fundus Images
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The purpose of this study is to develop and evaluate an attention-guided deep learning model using the multicolor imaging module of Spectralis Optical Coherence Tomography (OCT) imaging for automated cataract severity classification and Visual Acuity (VA) prediction.
Methods
We analyzed 314 multicolor fundus images from 169 patients. Images were preprocessed using an enhanced Retinex algorithm and segmented into three concentric macular zones: Zone 1 (fovea, central 1.5 mm diameter), Zone 2 (parafovea, 1.5-2.5 mm ring), and Zone 3 (perifovea, >2.5 mm radius). A multi-output convolutional neural network (AttCatVgg-Net), based on VGG-16 and enhanced with a Convolutional Block Attention Module (CBAM), was trained to simultaneously perform three-class cataract classification (normal to mild, moderate, severe) and visual acuity (VA) regression. Model performance was assessed using accuracy, AUC, F1-score, and regression metrics. Statistical analyses included the Wilcoxon signed-rank test and the Spearman correlation test.
Results
For cataract grading, the integrated model using all wavelengths and zones achieved 92.5% accuracy, 94.7% area under the ROC curve (AUC), and a 92.1% F1-score. The green channel alone achieved 90.1% accuracy and 0.93 AUC, while the red channel yielded lower performance (76.3% accuracy, 0.83 AUC). Among anatomical zones, Zone 1 (fovea) and Zone 3 achieved 84.3% and 84.71% accuracy and 0.88 and 0.89 AUC, respectively, whereas Zone 2 underperformed (60.41% accuracy, 0.71 AUC). For visual acuity prediction, the full model achieved a mean absolute error (MAE) of 0.1181 and a coefficient of determination (R-squared) of 0.7759. The green channel demonstrated the strongest correlation with actual VA (correlation coefficient = 0.823, p < 0.001), followed by green-red (0.817) and blue (0.809). The green channel also achieved the lowest Mean Squared Error (MSE = 0.0369) and Root Mean Square Error (RMSE = 0.1920), outperforming other channels.
Conclusions
Attention-guided deep learning applied to Spectralis OCT multicolor imaging enables accurate, objective classification of cataract severity and estimation of cataract-related visual acuity loss.