Attcatvgg-Net: an Explainable Multioutput Deep Learning Framework for Cataract Stage Classification and Visual Acuity Regression using Multicolor Fundus Images

Mostafa Nazarpour-Servak
Neda Taghinezhad
Tahereh Mahmoudi
Ali Azimi
M. Hossein Nowroozzadeh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The purpose of this study is to develop and evaluate an attention-guided deep learning model using the multicolor imaging module of Spectralis Optical Coherence Tomography (OCT) imaging for automated cataract severity classification and Visual Acuity (VA) prediction.

Methods

We analyzed 314 multicolor fundus images from 169 patients. Images were preprocessed using an enhanced Retinex algorithm and segmented into three concentric macular zones: Zone 1 (fovea, central 1.5 mm diameter), Zone 2 (parafovea, 1.5-2.5 mm ring), and Zone 3 (perifovea, >2.5 mm radius). A multi-output convolutional neural network (AttCatVgg-Net), based on VGG-16 and enhanced with a Convolutional Block Attention Module (CBAM), was trained to simultaneously perform three-class cataract classification (normal to mild, moderate, severe) and visual acuity (VA) regression. Model performance was assessed using accuracy, AUC, F1-score, and regression metrics. Statistical analyses included the Wilcoxon signed-rank test and the Spearman correlation test.

Results

For cataract grading, the integrated model using all wavelengths and zones achieved 92.5% accuracy, 94.7% area under the ROC curve (AUC), and a 92.1% F1-score. The green channel alone achieved 90.1% accuracy and 0.93 AUC, while the red channel yielded lower performance (76.3% accuracy, 0.83 AUC). Among anatomical zones, Zone 1 (fovea) and Zone 3 achieved 84.3% and 84.71% accuracy and 0.88 and 0.89 AUC, respectively, whereas Zone 2 underperformed (60.41% accuracy, 0.71 AUC). For visual acuity prediction, the full model achieved a mean absolute error (MAE) of 0.1181 and a coefficient of determination (R-squared) of 0.7759. The green channel demonstrated the strongest correlation with actual VA (correlation coefficient = 0.823, p < 0.001), followed by green-red (0.817) and blue (0.809). The green channel also achieved the lowest Mean Squared Error (MSE = 0.0369) and Root Mean Square Error (RMSE = 0.1920), outperforming other channels.

Conclusions

Attention-guided deep learning applied to Spectralis OCT multicolor imaging enables accurate, objective classification of cataract severity and estimation of cataract-related visual acuity loss.

Version published to 10.1101/2025.10.25.684563 on bioRxiv
Oct 26, 2025

Pre-trained Vision Transformer With Masked Autoencoder for Automated Diabetic Macular Edema Detection from Optical Coherence Tomography Images

This article has 3 authors:
1. Shumpei Takinami
2. Shohei Morikawa
3. Tetsuro Oshika
This article has no evaluationsLatest version Oct 7, 2025
Explainable Deep Learning for Lesion-Level Detection of Diabetic Retinopathy: A Segmentation Approach Using Fundus Images Graded as Mild-to-Moderate Nonproliferative Diabetic Retinopathy

This article has 6 authors:
1. Takumi Sato
2. Koichi Nishitsuka
3. Tohru Itoh
4. Toshihiro Okashita
5. Satoshi Wada
6. Atsushi Shinjo
This article has no evaluationsLatest version Oct 3, 2025
Eye Disease Classification from OCT Images

This article has 1 author:
1. Srija Arumalla
This article has no evaluationsLatest version Oct 24, 2025

Discuss this preprint

Listed in

Abstract

Methods

Results

Conclusions

Article activity feed

Related articles

Pre-trained Vision Transformer With Masked Autoencoder for Automated Diabetic Macular Edema Detection from Optical Coherence Tomography Images

Explainable Deep Learning for Lesion-Level Detection of Diabetic Retinopathy: A Segmentation Approach Using Fundus Images Graded as Mild-to-Moderate Nonproliferative Diabetic Retinopathy

Eye Disease Classification from OCT Images