Primary Classification of Skin Diseases Using an Explainable Multimodal VLM with a DBSCAN-Centroid-Based Confidence Score

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper presents an explainable multimodal vision–language framework for the primary classification of skin diseases. Using compact vision–language models (VLMs)—Gemma 3 4B and Qwen 2.5 VL 7B—the system integrates synthetic skin tumor and lesion images with natural-language disease descriptions, grounding its predictions in lay-accessible dermatologic concepts to improve interpretability. Low-rank adaptation (LoRA) fine-tuning on the AI-Hub synthetic skin-tumor dataset demonstrates the feasibility of deploying such models in resource-constrained environments. Model performance is evaluated using quantitative metrics, including accuracy, precision, recall, and F1-score, and a DBSCAN-centroid-based semantic confidence-scoring method is introduced to estimate cluster similarities in the image-embedding space. The experimental results show that lightweight multimodal VLMs can achieve stable and accurate performance on primary skin disease classification, indicating their potential as explainable, AI-assisted tools for dermatologic decision support.

Article activity feed