Comparative Analysis of Deep Learning Models for Coronary Artery Segmentation: Performance and Inference Time Evaluation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Coronary artery disease (CAD) is one of the leading causes of cardiovascular mortality worldwide. Accurate segmentation of coronary arteries from X-ray coronary angiography (XCA) images is crucial for assessing vessel morphology and stenosis, thereby supporting computer-aided diagnosis and guiding interventional treatment decisions. Although recent studies have primarily focused on enhancing segmentation accuracy using deep learning models, limited attention has been given to evaluating their inference time — a factor that is equally important for clinical deployment and real-time decision support. Objective This study compares both segmentation performance and inference time of U-Net, U-Net++, and SegFormer on the ARCADE XCA dataset (stenosis and SYNTAX subsets). Methods All XCA images were resized to 256 × 256 pixels, normalized, and augmented prior to training. The U-Net and U-Net + + architectures were implemented as convolutional encoder–decoder networks with skip connections, whereas SegFormer employed a hierarchical Transformer-based encoder coupled with a lightweight MLP decoder. All models were trained for 100 epoch using cross-entropy loss with class-balancing weights. Performance was evaluated in terms of segmentation accuracy, dice score, and per-image inference time. Results On the stenosis subset, U-Net and U-Net + + achieved the highest training accuracy (99.82%), while SegFormer attained a slightly lower accuracy (99.15%) but delivered the fastest inference time (0.05 s per image). On the SYNTAX subset, U-Net + + obtained the best training accuracy (98.13%), followed closely by U-Net (98.04%) and SegFormer (97.00%). Despite its lower accuracy, SegFormer consistently demonstrated superior efficiency, achieving the shortest inference time (0.18 s per image). Conclusion U-Net + + demonstrated the highest segmentation accuracy, SegFormer provided the most significant runtime advantage, and U-Net achieved a balanced trade-off between the two. Taken together, these findings suggest that model selection should be informed by the specific priorities of clinical deployment, whether accuracy, inference speed, or a compromise between both is most critical.