Graph Neural Networks for Multi-modal Skin Lesion Classification Using Metadata and Visual Features

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate classification of skin lesions is crucial for early detection of melanoma and other skin cancers. However, challenges such as inter-class similarity, intra-class variation, and limited labeled data persist. We propose a novel multi-modal graph neural network (GNN) that integrates three complementary modalities: deep visual features from pre-trained convolutional neural networks (CNNs), handcrafted descriptors (e.g., HSV histograms, fractals), and structured clinical metadata (e.g., age, sex, lesion location).Each lesion sample is represented as a node in a \(k\)-nearest neighbor graph constructed in the fused multi-modal feature space. A two-layer graph convolutional network is used to perform relational learning and classification. The model is trained with class-weighted cross-entropy loss and evaluated using repeated stratified 5-fold cross-validation.Experiments on the HAM10000 and ISIC2020 datasets show that our method consistently outperforms strong baselines. On HAM10000, the proposed GNN achieves an accuracy of 0.960 \(\pm\) 0.003, F1 score of 0.959 $\pm$ 0.004, and AUC of 0.999 $\pm$ 0.000. On ISIC2020, it reaches 0.999 $\pm$ 0.001 in both F1 and accuracy, with an AUC of 1.000. These results validate the effectiveness of multi-modal fusion and graph-based reasoning in skin lesion diagnosis and suggest the method's potential for real-world clinical deployment.

Article activity feed