MCR-SL: A Multimodal, Context-Rich Skin Lesion Dataset for Skin Cancer Diagnosis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Well-annotated datasets are fundamental for developing robust artificial intelligence models, particularly in medical fields. Many existing skin lesion datasets have limitations in image diversity (including only clinical or dermoscopic images) or metadata, which hinder their utility for mimicking real-world clinical practice. The purpose of the MCR-SL dataset is to introduce a new, meticulously curated dataset that addresses these limitations. The MCR-SL dataset was collected from 60 subjects at the University Hospital of North Norway and comprises 779 clinical images and 1,352 dermoscopic images of 240 unique lesions. The lesion types included are nevus, seborrheic keratosis, basal cell carcinoma, actinic keratosis, atypical nevus, melanoma, squamous cell carcinoma, angioma, and dermatofibroma. Labels were established by combining the consensus of a panel of four dermatologists with histopathology reports for the 29 excised lesions, with the latter serving as the gold standard. The resulting dataset provides a comprehensive resource with clinical and dermoscopic images and rich clinical context, ensuring a high level of clinical relevance, surpassing many existing resources in that matter. The MCR-SL dataset provides a holistic and reliable foundation for validating artificial intelligence models, enabling a more nuanced and clinically relevant approach to automated skin lesion diagnosis that mirrors real-world clinical practice.