A Novel Clinically Explainable Vision Transformer for OCT-Based Retinal Disease Classification: Integrating UniMIE Enhancement and Grad-CAM Interpretability

Vishal Upmanu
Jaya Singh
Pranshu Saxena
Jagendra Singh
Shilpa Srivast
Aprna Tripathi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Explainable and precise Optical Coherence Tomography (OCT) image classification plays an essential role in early retinal disease detection and follow-up for conditions like Choroidal Neovascularization (CNV), Diabetic Macular Edema (DME), and Drusen. Conventionally applied deep learning models, including transformers and convolutional neural networks, yield state-of-the-art classification results but suffer from lacking interpretability within clinical practice and the difficulty in subtle differentiation among diseases. This paper suggests a clinically interpretable vision transformer (ViT) model, combining Universal Medical Image Enhancement (UniMIE)-based image enhancement, hierarchical ViT feature extraction, and Gradient-weighted Class Activation Mapping (Grad-CAM) based visualization to enhance both classification accuracy and interpretability. The Proposed ViT model is tested on UCSD and Mendeley OCT datasets, which has a top accuracy of 98.84% in 5-fold cross-validation, outperforming existing convolutional neural based and transformer-based methods. The model also attains an AUC-ROC value of 99.45%, showing better discriminative ability in CNV, DME, Drusen, and Normal classes. An extensive hyperparameter tuning approach optimized the dropout rate, encoder depth, and learning rate to improve accuracy and generalization. Grad-CAM visualizations also add clinical interpretability, where decision-critical retinal areas are pointed out, ensuring predictions to be consistent with pathological features noticed by ophthalmologists. Comparative analysis against current deep learning models reaffirms that the proposed ViT Model offers top-class performance without sacrificing the ability to solve primary shortcomings like deficiency in fine-grained classification, overfitting, and interpretability.

Version published to 10.21203/rs.3.rs-6510478/v1 on Research Square
Jun 25, 2025

Deep Learning Framework for Multiclass Detection of Ocular Diseases in Fundus Images

This article has 4 authors:
1. Shajila Beegam
2. Mala Kalra
3. Abhijit Bhowmik
4. Jibitesh Kumar Panda
This article has no evaluationsLatest version Jun 30, 2025
Towards Sustainable Retinal Diagnostics A Deep Learning Alternative to OCT for Macular Thickness Estimation

This article has 2 authors:
1. C A Aparna
2. B R Manju
This article has no evaluationsLatest version Jul 1, 2025
Eye Disease Classification by Advanced Deep Transfer Learning System Using Resnet50 & Xception

This article has 3 authors:
1. sairam N sunandhan
2. S Suthaanthiraa
3. Maheswari S Uma
This article has no evaluationsLatest version Jul 15, 2025

Listed in

Abstract

Article activity feed

Related articles

Deep Learning Framework for Multiclass Detection of Ocular Diseases in Fundus Images

Towards Sustainable Retinal Diagnostics A Deep Learning Alternative to OCT for Macular Thickness Estimation

Eye Disease Classification by Advanced Deep Transfer Learning System Using Resnet50 &amp; Xception

Eye Disease Classification by Advanced Deep Transfer Learning System Using Resnet50 & Xception