Deep Learning for Diabetic Retinopathy Detection: A Review of Multimodal Data Fusion Approaches
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Diabetic retinopathy (DR) is a diabetes-induced eye disease that affects the blood vessels of the retina, and a lack of proper DR detection could result in the loss of vision. Although deep learning (DL) has successfully analyzed single-modality medical data, DR diagnosis often requires interpreting diverse information such as retinal imaging and clinical data. Multimodal data fusion has the potential to accommodate robust and complementary information between these sources for more accurate diagnostic decisions. However, DR detection using deep learning-based multimodal fusion is still challenging and underdeveloped. This review investigates recent advances in applying DL techniques to multimodal DR detection , focusing on model architecture, modality combinations, fusion strategies, and performance metrics. Among these architectures, convolutional neural networks (CNNs) are the most popular, and the fusion of fundus images with OCT or 1 EHR data is the most common pairing. Early and joint fusion strategies dominate, while model performance is typically assessed using accuracy, AUC, sensitivity, and F1-score. Despite promising progress, the field still faces challenges including modality heterogeneity, lack of standardized multimodal datasets, and limited model interpretability. Emerging trends point toward hybrid architecture, attention mechanisms, and self-supervised learning as potential solutions. This review highlights current developments and outlines future directions to support the design of scalable, generalizable, and clinically applicable multimodal DL systems for DR detection.