Enhancing Diabetic Retinopathy Prediction Using Transformer-based Attention in Hybrid CNN Models

Aayush Verma
Sanket Agrawal
Shreyans Jain
S Kanthimathi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Diabetic retinopathy (DR) is one of the major causes of blindness across the globe and hence its early and accurate detection is required to avoid drastic vision loss. In this study, we introduce a hybrid learning method that utilizes the combination of deep learning models with transformer-based attention mechanisms to make improved predictions for diabetic retinopathy. Our approach utilizes an ensemble of pre-trained models including InceptionV3, DenseNet121, VGG16, MobileNetV2, and ResNet50, which are individually renowned for their robust feature extraction ability. By incorporating self-attention and multi-head attention mechanisms with the hybrid models, we try to enhance feature representation and obtain increased classification accuracy. Our experimental findings indicate that such hybrid architectures are able to learn intricate retinal patterns and improve model performance compared to individual architectures. Surprisingly, the integration of ResNet50 and DenseNet121 with a transformer-based attention mechanism provided the most stable accuracy and robust results. This paper demonstrates the potential of hybrid deep learning models with the inclusion of attention mechanisms as a viable solution for enhanced diabetic retinopathy diagnosis. Our findings promote the advancement of sophisticated automatic medical image analysis techniques and improve clinical decision support systems for retinal disease detection.

Version published to 10.21203/rs.3.rs-6552005/v1 on Research Square
Sep 23, 2025

A Transformer Driven Hybrid Feature Fusion Framework for Multi-Modal Medical Image Analysis

This article has 2 authors:
1. S. Vidhya
2. R. Nithya
This article has no evaluationsLatest version Oct 21, 2025
Benchmarking Deep Learning Models for Real-Time Diabetic Retinal Blood Vessel Segmentation

This article has 11 authors:
1. Robert Ngabo Mugisha
2. Geoffrey Munyaneza
3. Fideli Nsanzumukunzi
4. Mediatrice Dusenge
5. Josue Uzigusenga
6. Theophilla Igihozo
7. Fabrice Mpozenzi
8. Emmanuella Nuwayo
9. Benny Uhoranishema
10. Prince Shema Musonerwa
11. Jean De Dieu Niyonteze
This article has no evaluationsLatest version Oct 8, 2025
Enhancing Multimodal Glaucoma Screening through Attention-Guided Vision-Language Fusion

This article has 3 authors:
1. Xiaoyu Liu
2. Jinchun Piao
3. Qi Wang
This article has no evaluationsLatest version Sep 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Transformer Driven Hybrid Feature Fusion Framework for Multi-Modal Medical Image Analysis

Benchmarking Deep Learning Models for Real-Time Diabetic Retinal Blood Vessel Segmentation

Enhancing Multimodal Glaucoma Screening through Attention-Guided Vision-Language Fusion