A Transformer Driven Hybrid Feature Fusion Framework for Multi-Modal Medical Image Analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Early disease diagnosis greatly depends on strong medical image classification models. In this paper, a hybrid method is proposed to combine handcrafted descriptors (HOG, BoVW) and deep features (VGG19) to form an integrative feature fusion representation. The combined features are then fed into an optimized Vision Transformer (FFXViT), which allows stronger global context modelling while maintaining key local information. Two reference modalities, histopathology images with three classes - adenocarcinoma, squamous cell carcinoma, benign and chest X-ray images with four classes - COVID-19, lung opacity, normal, viral pneumonia, were experimented on. The proposed approach FFXViT attained 99.50% on histopathology and 97.41% on chest X-rays accuracy, a remarkable improvement over state-of-the-art CNNs, transformer and hybrid baselines. The experiment showcases the scalability, robustness, and interpretability of the framework and empirically verify FFXViT as a viable solution for robust cross-modality medical image analysis and clinical decision support.

Article activity feed