Benchmarking Deep Learning Models for Real-Time Diabetic Retinal Blood Vessel Segmentation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Diabetic retinopathy (DR) remains one of the leading causes of preventable blindness worldwide. Accurate segmentation of retinal blood vessels is essential for early DR detection, as vascular abnormalities provide key markers of disease onset and progression. Although recent deep learning (DL) methods have achieved strong segmentation accuracy, limited attention has been given to benchmarking their inference efficiency, a critical factor for real-time clinical deployment in large-scale screening and teleophthalmology. Objective This study systematically benchmarks U-Net, U-Net++, and SegFormer on the DRIVE dataset to jointly evaluate segmentation accuracy and inference time, thereby addressing the gap between performance reporting and practical clinical applicability. Methods All images were resized to 256×256 pixels, normalized, and augmented with rotations, flips, and scaling. U-Net and U-Net + + were implemented as convolutional encoder–decoder architectures with skip connections, while SegFormer employed a hierarchical Transformer backbone with a lightweight MLP decoder. Models were trained for 60 epochs using class-balanced cross-entropy loss. Evaluation metrics included pixel accuracy, Dice similarity coefficient (DSC), and per-image inference time. Results U-Net + + achieved the highest segmentation fidelity (DSC = 0.850; Accuracy = 0. 9778), narrowly outperforming U-Net (DSC = 0.847; Accuracy = 0.9783), while SegFormer performed lower in accuracy (DSC = 0.637; Accuracy = 0.9106) but delivered the fastest inference time (0.67 s per image), being ~ 11× faster than U-Net + + and ~ 4.8× faster than U-Net. Qualitative analysis confirmed U-Net + + best preserved thin vessels and vascular continuity, whereas SegFormer tended to thicken vessel boundaries and omit fine branches. Conclusion U-Net + + demonstrated superior segmentation accuracy, SegFormer provided a significant runtime advantage, and U-Net offered a balanced trade-off between quality and efficiency. These findings highlight that model selection for retinal vessel segmentation should depend on the specific priorities of deployment — whether precision, inference speed, or a compromise between both is most critical.