Optimizing Deep Learning for Skin Cancer: A Comparative Study of Convolutional and Attention-Based Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Skin cancer is among the most prevalent malignancies worldwide, with over 1.5 million new cases estimated in 2022 alone according to GLOBOCAN data [1] . Despite the availability of dermoscopy, experienced dermatologists achieve a melanoma detection sensitivity of approximately 75–84% using visual examination, a rate that underscores the diagnostic limitations of unaided clinical assessment [2] . This study presents a systematic comparison of five deep learning architectures for the automated classification of seven skin lesion types using the HAM10000 dataset [6] , comprising 10,015 dermoscopic images. We evaluate four architectures spanning both convolutional and attention-based paradigms: ResNet-50, EfficientNet-B4, ConvNeXt-Base, Swin Transformer-Base, and Vision Transformer (ViT-B/16). To address the pronounced class imbalance inherent in the dataset, we employed patient-level data partitioning via GroupShuffleSplit to prevent lesion leakage across splits, and WeightedRandomSampler during training. All models were trained using AdamW optimization with label smoothing and mixed-precision training. Transformer-based architectures were further stabilized through linear warmup scheduling and stochastic depth regularization. Our best single model, ViT-B/16, achieved a test accuracy of 85.66% and a macro AUC-ROC of 0.9629. An ensemble of EfficientNet-B4 and Swin Transformer-Base achieved the highest overall performance with a test accuracy of 86.57%, a balanced accuracy of 79.98%, a macro F1-score of 0.7856, and a macro AUC-ROC of 0.9811. These results demonstrate that heterogeneous ensemble strategies combining architecturally diverse models offer a meaningful improvement over individual classifiers in dermoscopic lesion classification.