Hybrid Deep Ensemble Architecture for Robust Diabetic Retinopathy Classification: Leveraging Transfer Learning and CNN-Transformer Synergy
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Diabetic Retinopathy (DR) is still one of the main reasons for vision loss worldwide, especially in places where people do not have easy access to regular eye checkups. Early and accurate disease detection is important to avoid permanent damage, but traditional methods are slow and sometimes inconsistent. This study proposes a deep learning framework that combines convolutional neural networks (CNNs), vision transformers, transfer learning, and ensemble techniques to improve DR detection. We used the APTOS 2019 dataset and tested the capabilities of 23 different pre-trained models. Then, we fine-tuned the top models and designed hybrid architectures by combining the best-performing CNNs and transformers in parallel and sequential ways to capture both image spatial features in short and long contexts. The best performance came from combining the top sequential hybrid models using the soft voting architecture, where we got an accuracy of 93.10%, ROC AUC of 99.22%, and F1-score of 93.07%. The optimized model showed that mixing different models and using ensemble methods can lead to better and more stable DR detection decisions. Our approach is a step toward building a reliable and automated system that could help doctors in real-world settings.