Multimodal Brain Tumor Classification via Triple Fusion Attention and Transformer-Based Feature Integration
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Brain tumor classification using multimodal neuroimaging continues to be a difficult undertaking because the heterogeneous kind of tumor appearance across imaging modalities and the complexity of meaningful feature integration. To overcome these restrictions, this research proposes a novel hybrid fusion pipeline that leverages the complimentary qualities of MRI and PET modalities through a synergistic combination of advanced deep learning and radiomics techniques. The pipeline begins with modality-specific denoising to enhance data quality, followed by precise tumor segmentation using a Multimodal Swin Transformer U-Net (MM-SwinUNet), capable of simulating cross-modal interactions and long-range dependency. For feature extraction, Vision Transformer (ViT) embeddings are utilized for MRI, capturing rich semantic representations, while handcrafted and CNN-based Deep Radiomics features are extracted from PET, preserving modality-specific morphological and metabolic information. These distinct features are then unified using a Triple Fusion Attention Module (TFAM), which dynamically attends to relevant features across modalities to form robust fused representations. To combat high dimensionality and enhance class separability, the fused features undergo Supervised UMAP embedding with Local Discriminant Structure Preservation. The final classification is performed using a SimCLR-pretrained ViT model, fine-tuned on the fused feature space to leverage contrastive pretraining and improve generalization. This novel pipeline demonstrates improved classification accuracy, enhanced modality synergy, and clinical interpretability. The proposed method holds significant promise for aiding diagnostic decision-making and advancing the role of AI in neuro-oncology.