A Novel Multi-Stage Fusion Pipeline for Robust and Interpretable Melanoma Classification Using Physics-Informed and Vision-Language Models

G. Isha
F. D Asbel sherlin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Melanoma, a highly aggressive form of skin cancer, requires early and accurate diagnosis to improve patient survival, yet existing deep learning methods often struggle with noise, limited labeled data, poor calibration, and cross-domain generalization. This paper proposes a novel, modular AI-driven pipeline for melanoma classification from dermoscopic images that integrates physics-informed preprocessing, diffusion–transformer-based segmentation, hybrid self-supervised representation learning, privacy-preserving federated classification, and uncertainty-aware explainability. Physics-Informed AI-Based Denoising Preprocessing (PIAIDP) effectively suppresses hair, glare, and illumination artifacts while preserving lesion boundaries, achieving 32.4 dB PSNR, 0.93 SSIM, and the lowest Hair IoU Loss of 0.094. Diffusion–Transformer Segmentation Fusion (DTSF) combines denoising diffusion models with TransUNet via multi-scale cross-attention, attaining a Dice score of 0.902, Jaccard index of 0.849, and reduced boundary error (HD95 = 7.6). From the segmented lesions, Contrastive Self-Supervised Hybrid Graph–ViT Embedding (CSS-HGVE) fuses structural graph representations with transformer-based visual features, yielding the highest linear probe accuracy of 90.8% and improved class separability (silhouette score = 0.51). For classification, the Prompt-Guided Multi-Task Federated Classifier (PG-MTFC) leverages domain-guided CLIP prompts and Evidential Deep Learning to jointly predict melanoma classes and quantify uncertainty. Under non-IID federated settings with differential privacy (ε ≤ 2.0), the proposed model achieves 92.6% accuracy, F1-score of 0.908, and AUROC of 0.949, with superior calibration (ECE = 0.050). The Uncertainty-Aware Multi-Modal Explainability (UAMME) module further enhances interpretability, achieving higher faithfulness (0.768) and clinician trust (4.4/5). Extensive evaluation on ISIC 2018, PH2, and Derm7pt datasets demonstrates improved robustness, fairness, and cross-dataset generalization, supporting the framework’s applicability for trustworthy and privacy-preserving clinical melanoma analysis.

Version published to 10.21203/rs.3.rs-8785131/v1 on Research Square
Mar 2, 2026

Optimizing Deep Learning for Skin Cancer: A Comparative Study of Convolutional and Attention-Based Models

This article has 1 author:
1. Khaled Wael Ezzat
This article has no evaluationsLatest version Apr 8, 2026
HDFF-Net: A Hybrid Dual-Feature Fusion Network with Cross-Modal Attention for Automated Colposcopic Transformation Zone Classification

This article has 2 authors:
1. B. Shubhaker¹
2. B. S. Raghavendra²
This article has no evaluationsLatest version Apr 7, 2026
Graph-Based Learning and Multimodal Learning for Colon Disease Classification: An Interpretable Study using CNN-GNN Pipelines and Vision-Language Models

This article has 5 authors:
1. Shahriar Sultan. Ramit
2. Alaya Parven. Alo
3. Md. Sadekur Rahman
4. Masud Rana Rashel
5. A. K.M. Kamrul Islam
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Optimizing Deep Learning for Skin Cancer: A Comparative Study of Convolutional and Attention-Based Models

HDFF-Net: A Hybrid Dual-Feature Fusion Network with Cross-Modal Attention for Automated Colposcopic Transformation Zone Classification

Graph-Based Learning and Multimodal Learning for Colon Disease Classification: An Interpretable Study using CNN-GNN Pipelines and Vision-Language Models