Clinical Application of Vision Transformers for Melanoma Classification: A Multi-Dataset Evaluation Study

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Melanoma is one of the most lethal skin cancers, with survival rates largely dependent on early detection, yet diagnosis remains difficult because of its visual similarity to benign nevi. Convolutional neural networks have achieved strong performance in dermoscopic analysis but often depend on fixed input sizes and local features, which can limit generalization. Vision Transformers, which capture global image relationships through self-attention, offer a promising alternative. Methods: A ViT-L/16 model was fine-tuned using the ISIC 2019 dataset containing more than 25,000 dermoscopic images. To expand the dataset and balance class representation, synthetic melanoma and nevus images were produced with StyleGAN2-ADA, retaining only high-confidence outputs. Model performance was evaluated on an external biopsy-confirmed dataset (MN187) and compared with CNN baselines (ResNet-152, DenseNet-201, EfficientNet-B7, ConvNeXt-XL, ViT-B/16) and the commercial MoleAnalyzer Pro system using ROC-AUC and DeLong’s test. Results: The ViT-L/16 model reached a baseline ROC-AUC of 0.902 on MN187, surpassing all CNN baselines and the MoleAnalyzer Pro system, though the difference was not statistically significant (p = 0.07). After adding 46,000 confidence-filtered GAN-generated images, the ROC-AUC increased to 0.915, giving a statistically significant improvement over the commercial MoleAnalyzer Pro system (p = 0.032). Conclusions: Vision Transformers show strong potential for melanoma classification, especially when combined with GAN-based augmentation, offering advantages in global feature representation and data expansion that support the development of dependable AI-driven clinical decision-support systems.

Article activity feed