A Novel Multi-Stage Fusion Pipeline for Robust and Interpretable Melanoma Classification Using Physics-Informed and Vision-Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Melanoma, a highly aggressive form of skin cancer, requires early and accurate diagnosis to improve patient survival, yet existing deep learning methods often struggle with noise, limited labeled data, poor calibration, and cross-domain generalization. This paper proposes a novel, modular AI-driven pipeline for melanoma classification from dermoscopic images that integrates physics-informed preprocessing, diffusion–transformer-based segmentation, hybrid self-supervised representation learning, privacy-preserving federated classification, and uncertainty-aware explainability. Physics-Informed AI-Based Denoising Preprocessing (PIAIDP) effectively suppresses hair, glare, and illumination artifacts while preserving lesion boundaries, achieving 32.4 dB PSNR, 0.93 SSIM, and the lowest Hair IoU Loss of 0.094. Diffusion–Transformer Segmentation Fusion (DTSF) combines denoising diffusion models with TransUNet via multi-scale cross-attention, attaining a Dice score of 0.902, Jaccard index of 0.849, and reduced boundary error (HD95 = 7.6). From the segmented lesions, Contrastive Self-Supervised Hybrid Graph–ViT Embedding (CSS-HGVE) fuses structural graph representations with transformer-based visual features, yielding the highest linear probe accuracy of 90.8% and improved class separability (silhouette score = 0.51). For classification, the Prompt-Guided Multi-Task Federated Classifier (PG-MTFC) leverages domain-guided CLIP prompts and Evidential Deep Learning to jointly predict melanoma classes and quantify uncertainty. Under non-IID federated settings with differential privacy (ε ≤ 2.0), the proposed model achieves 92.6% accuracy, F1-score of 0.908, and AUROC of 0.949, with superior calibration (ECE = 0.050). The Uncertainty-Aware Multi-Modal Explainability (UAMME) module further enhances interpretability, achieving higher faithfulness (0.768) and clinician trust (4.4/5). Extensive evaluation on ISIC 2018, PH2, and Derm7pt datasets demonstrates improved robustness, fairness, and cross-dataset generalization, supporting the framework’s applicability for trustworthy and privacy-preserving clinical melanoma analysis.