Pre- and Post-Gated Attention-based Multimodal Fusion for Skin Lesion Classification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate classification of skin lesions plays an important role in enabling physicians to promptly diagnose skin cancer and to support clinical decision-making. However, multimodal data sources (images, handcrafted features, and clinical metadata) are often heterogeneous and noisy, making fusion challenging. In this study, we propose a Pre- and Post-Gated Attention-based Multimodal Fusion (PPG-AFM) model. The Pre-Gating mechanism is designed to filter noise before computing inter-modal relations via self-attention, while the Post-Gating mechanism combined with softmax pooling and a temperature parameter τ enables adaptive weighting after fusion, thereby enhancing robustness and generalization. In addition, modality dropout is integrated to simulate scenarios with missing modalities. Experiments are conducted on two benchmark datasets, HAM10000 and ISIC 2019. Results demonstrate that PPG-AFM outperforms several popular baselines (early fusion, late fusion, weighted late, ensemble, and pure attention). On the HAM10000 dataset, PPG-AFM improves Macro-F1 by approximately 3–4% compared to late fusion. On the ISIC 2019 dataset, the model achieves an average Accuracy of 53.2% and a Macro-F1 of 39.4%, significantly higher than the late fusion baseline (Accuracy 50.7%, Macro-F1 35.6%). Moreover, PPG-AFM shows greater stability under modality-missing scenarios. These findings confirm that incorporating Pre- and Post-Gating into the attention framework not only improves predictive accuracy but also enhances model robustness, thereby offering strong potential for practical deployment in dermatological decision support.