FusionNeXt-XtremeNet: A Deep Ensemble Model with LLM-Aided Clinical Report Generation for Dermoscopic Image Classification

Saroj Ghadle
Jitendra Tembhurne

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper presents FusionNeXt-XtremeNet, a novel deep ensemble architecture that combines ConvNeXt, Vision Transformer (ViT), and EfficientNetV2 for classifying dermoscopic images based on acquisition types. To improve clinical interpretability, a GPT-2-based Large Language Model (LLM) enhanced by the Language-augmented Multimodal Attention (LeMMA) mechanism is integrated to generate structured diagnostic reports. Model evaluated on the ISIC 2020--2022 dataset of 1,767 images, and achieves state-of-the-art performance in binary classification (94.1% accuracy, 94.1% F1-score, 0.969 ROC-AUC), three-class classification (90.6% accuracy, 90.8% F1-score), and four-class classification (87.6% accuracy, 87.8% F1-score). The LeMMA-augmented GPT-2 generates clinically relevant reports with a BLEU score of 0.85, reducing generation time by 15.2% compared to baseline, and achieves high dermatologist evaluation scores (accuracy: 4.3/5, relevance: 4.4/5). Grad-CAM visualisations demonstrate strong alignment with clinical features (r=0.82, p<0.001), with 85% of attention regions corresponding to dermatologically significant patterns. This dual framework not only enhances prediction reliability but also bridges the gap between black-box AI models and clinical usability through explainable, text-based outputs.

Version published to 10.21203/rs.3.rs-7784164/v1 on Research Square
Mar 5, 2026

AsymmetryNet: A Clinically Inspired Asymmetry Attention Model for Predicting HPV Status in Oropharyngeal Squamous Cell Carcinoma on Computed Tomography

This article has 4 authors:
1. M.D. Ph.D. John D. Mayfield
2. Elly Arizono
3. Karen Buch
4. M.D. Osamu Sakai
This article has no evaluationsLatest version Feb 27, 2026
ML-ConvNet: A Lightweight and Interpretable Unified Architecture for Medical Image Classification Across Modalities

This article has 10 authors:
1. Williams Ayivi
2. Xiaoling Zhang
3. Yeongx Yeong Hyeon Gu
4. Amil Aligayev
5. Ali Alqahtani
6. Wisdom Xornam Ativi
7. Francis Sam
8. Muhammed Amin Abdullah
9. Emmanuel Sarpong Addai Gyarteng
10. Mugahed A. Al-antari
This article has no evaluationsLatest version Mar 17, 2026
X-QSViT: Explainable Quantum-Self-Supervised Vision Transformer for Lung Classification

This article has 4 authors:
1. Vishal Vishal
2. Vinay Kukreja
3. Kanwal Preet Kour
4. Shiva Mehta
This article has no evaluationsLatest version Mar 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AsymmetryNet: A Clinically Inspired Asymmetry Attention Model for Predicting HPV Status in Oropharyngeal Squamous Cell Carcinoma on Computed Tomography

ML-ConvNet: A Lightweight and Interpretable Unified Architecture for Medical Image Classification Across Modalities

X-QSViT: Explainable Quantum-Self-Supervised Vision Transformer for Lung Classification