FusionNeXt-XtremeNet: A Deep Ensemble Model with LLM-Aided Clinical Report Generation for Dermoscopic Image Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper presents FusionNeXt-XtremeNet, a novel deep ensemble architecture that combines ConvNeXt, Vision Transformer (ViT), and EfficientNetV2 for classifying dermoscopic images based on acquisition types. To improve clinical interpretability, a GPT-2-based Large Language Model (LLM) enhanced by the Language-augmented Multimodal Attention (LeMMA) mechanism is integrated to generate structured diagnostic reports. Model evaluated on the ISIC 2020--2022 dataset of 1,767 images, and achieves state-of-the-art performance in binary classification (94.1% accuracy, 94.1% F1-score, 0.969 ROC-AUC), three-class classification (90.6% accuracy, 90.8% F1-score), and four-class classification (87.6% accuracy, 87.8% F1-score). The LeMMA-augmented GPT-2 generates clinically relevant reports with a BLEU score of 0.85, reducing generation time by 15.2% compared to baseline, and achieves high dermatologist evaluation scores (accuracy: 4.3/5, relevance: 4.4/5). Grad-CAM visualisations demonstrate strong alignment with clinical features (r=0.82, p<0.001), with 85% of attention regions corresponding to dermatologically significant patterns. This dual framework not only enhances prediction reliability but also bridges the gap between black-box AI models and clinical usability through explainable, text-based outputs.

Article activity feed