A Unified Vision Transformer and Convolutional Neural Network Framework for Multi-Domain Cancer Classification

Heba M. Emara
Walid El-Shafai
Naglaa F. Soliman
Abeer D. Algarni
Fathi E. Abd El-Samie
Amira A. Mahmoud

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate and reliable classification of cancer from medical imaging is essential for effective computer-aided diagnosis. In this study, we conduct a comprehensive evaluation of three deep learning architectures—Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and a hybrid model (HViT-CNN) that integrates CNN backbones with transformer-based attention mechanisms. These models are benchmarked across three diverse and clinically relevant imaging modalities: brain magnetic resonance imaging (MRI), dermoscopic images for skin cancer, and cytology slides for cervical cancer. While CNNs demonstrate strong performance in capturing local texture features and ViTs offer advantages in modeling global spatial relationships, both architectures exhibit modality-specific limitations. The proposed HViT-CNN addresses these limitations by combining localized feature extraction with global contextual reasoning. Across all datasets, the hybrid model consistently achieved the highest classification accuracy of 98.4% for brain tumors, 98.0% for skin cancer, and 99.0% for cervical cancer—outperforming its individual components. These results underscore the effectiveness of hybrid architectures in handling both coarse and fine-grained image features, and highlight their potential for advancing generalizable, high-precision diagnostic tools in medical image analysis.

Version published to 10.21203/rs.3.rs-6633290/v1 on Research Square
Jun 13, 2025

Wavelet-CNN Feature Fusion Architecture for Robust Breast Cancer Classification in Histopathological Imaging

This article has 4 authors:
1. Manvi Bohra
2. Kamred Udham Singh
3. Indrajeet Kumar
4. Mohd Asif Shah
This article has no evaluationsLatest version Jun 30, 2025
Hybrid Swin Transformer EfficientNet U-Net Model for Enhanced Brain Tumor Segmentation

This article has 6 authors:
1. Pankaj Kunekar
2. Aditya Yadav
3. Abhishek Yadav
4. Yash Dusankar
5. Yashraj Nalawade
6. Swarnim Yawale
This article has no evaluationsLatest version Jul 18, 2025
Automated Thyroid Nodule Classification in Ultrasound Imaging Using a Hybrid Vision Transformer and Wasserstein GAN with Gradient Penalty

This article has 2 authors:
1. Naga Sujini Ganne
2. Sivadi Balakrishna
This article has no evaluationsLatest version Aug 4, 2025

Listed in

Abstract

Article activity feed

Related articles

Wavelet-CNN Feature Fusion Architecture for Robust Breast Cancer Classification in Histopathological Imaging

Hybrid Swin Transformer EfficientNet U-Net Model for Enhanced Brain Tumor Segmentation

Automated Thyroid Nodule Classification in Ultrasound Imaging Using a Hybrid Vision Transformer and Wasserstein GAN with Gradient Penalty