Multi-resolution vision transformer model for skin cancer subtype classification using histopathology slides
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction
Digital pathology has significantly advanced cancer diagnosis by enabling high-resolution visualisation and assessment of tissue specimens. However, the manual analysis of these images remains labour-intensive and susceptible to human error, resulting in inconsistencies in diagnosis and treatment decisions. Herein, we developed and externally validated a multi-resolution model for classifying subtypes of skin cancer from whole slide images ( WSIs).
Methods
We constructed a dataset comprising approximately 1.13 million histological patches (dividing the WSIs into non-overlapping tiles) from the Non-Melanoma Skin Cancer Segmentation (NMSCS) and Heidelberg datasets. All the patches were normalised using the Macenko method before training a self-supervised vision transformer-based model to classify the most common subtypes of skin cancer: basal cell carcinoma (BCC), squamous cell carcinoma (SCC), intraepidermal carcinoma (IEC), Melanoma, Naevi, and Non-cancerous. Our multi-resolution model was designed to classify melanoma and non-melanoma skin cancer subtypes, incorporating multi-resolution data at 10x, 20x, 40x, and 400x magnifications. The model was externally validated on 5,147 slides from 4,066 patients for non-melanoma cancer subtypes. The model’s performance was evaluated using classification metrics, and the quadratic weighted Cohen’s Kappa ( k ) score was used to measure the agreement between the model’s predictions and the actual labels with a 95% confidence interval (CI).
Results
Our multi-resolution model demonstrated strong classification performance across six classes, achieving an overall k score of 0.859 (95% CI: 0.851, 0.866) and 0.898 (95% CI: 0.892, 0.904) on the validation and testing sets, respectively, reflecting robust performance across diverse skin cancer subtypes. The multi-resolution model for non-melanoma skin cancer exhibited superior performance, achieving an overall k score of 0.919 (95% CI: 0.914, 0.924) on the validation set. On the testing set, the k score ranged from 0.996 to 0.889 with magnifications of 10x, 20x, 40x, and 400x. The attention maps highlighted clinically relevant features for cancerous tissue at different magnifications. Additionally, the model obtained a k score of 0.791 (95% CI: 0.774, 0.808) on the external data at the slide level, indicating substantial agreement between the model’s prediction and the actual label of WSIs.
Conclusion
Our multi-resolution model has the potential to assist anatomical pathologists in automatically detecting, highlighting, and classifying subtypes of melanoma and non-melanoma skin cancer subtypes directly from WSIs. This capability could ultimately improve patient outcomes and more effective clinical decision-making in digital pathology. For non-melanoma cancer, our model could be deployed in regions with limited access to experienced dermatopathologists and a high incidence of the disease, particularly in low-resource settings.