Out-of-Distribution Performance Analysis of Skin Lesion Classifiers for dermoscopic images

Eva Milara
Vanesa Gómez-Martínez
David Chushig-Muzo
María Castro-Fernández
Gustavo M. Callico
Conceição Granja
Cristina Soguero-Ruiz

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: The availability of public skin lesion image datasets has enabled rapid progress in classification tasks. However, models trained on datasets with similar characteristics, in-distribution (ID) data, often struggle to generalize to new and different data, limiting their utility in clinical settings. New methods are thus needed to assess algorithm performance and trustworthiness on out-of-distribution (OOD) data. Objective: This study aims to evaluate the generalization capacity and robustness of deep learning models for the binary classification (malignant vs non-malignant) of skin lesions by assessing their performance and predictive confidence in OOD settings. Methods: To this end, four convolutional neural networks (CNNs) —AlexNet, VGG, ResNet, and DenseNet— are trained using public datasets, which serve as the ID group. Their performance and reliability are then evaluated under distribution shifts by testing them on private datasets, considered OOD cohorts. Results: The VGG model achieves the best overall performance on the ID test set (AUROC = 0.895), maintaining balanced performance across OOD datasets. However, domain shift analysis reveals marked performance drops in specific domains, particularly those with strong distributional shifts in age and diagnosis. Conclusions: The results underscore the need for domain-aware evaluation and the development of models trained on more diverse and representative datasets to ensure generalization across clinically relevant populations.

Version published to 10.21203/rs.3.rs-7544969/v1 on Research Square
Sep 9, 2025

Clinical Application of Vision Transformers for Melanoma Classification: A Multi-Dataset Evaluation Study

This article has 5 authors:
1. Antony Garcia
2. Jixing Zhou
3. Gabriela Pinero-Crespo
4. Thomas Beachkofsky
5. Xinming Huang
This article has no evaluationsLatest version Oct 6, 2025
DermFusionX: An Explainable CNN–MLP Late Fusion Framework for Multimodal Skin Lesion Classification

This article has 1 author:
1. Vanshika Sharma
This article has no evaluationsLatest version Sep 25, 2025
Sun-Exposure and Lesion Location Bias in Deep Learning Models for Skin Cancer Detection

This article has 5 authors:
1. Eva Milara
2. Vanesa Gómez-Martínez
3. David Chushig-Muzo
4. Conceição Granja
5. Cristina Soguero-Ruiz
This article has no evaluationsLatest version Sep 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Clinical Application of Vision Transformers for Melanoma Classification: A Multi-Dataset Evaluation Study

DermFusionX: An Explainable CNN–MLP Late Fusion Framework for Multimodal Skin Lesion Classification

Sun-Exposure and Lesion Location Bias in Deep Learning Models for Skin Cancer Detection