Multiscale Cross-Attention of Hyperspectral and Multispectral Image Fusion Based on Transformer

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Owing to the limitations of imaging sensors, hyperspectral image (HSI) typically suffer from low spatial resolution. To obtain HSI with high spatial resolution, HSI-MSI fusion has become an effective and widely adopted technique. However, existing deep learning-based HSI-MSI fusion methods often struggle to capture local details and global context, especially when features span multiple scales. To address these issues, we propose a novel Transformer-based multiscale cross-attention fusion network (MCA-Net). MCA-Net integrates three key innovations to overcome these challenges. Firstly, the heterogeneous convolution parallel attention enhancement module (HCPAEM) combines dilated depthwise separable convolutions with parallel attention mechanisms to effectively enhance the representation of both local and global features. Secondly, the multiscale local-global feature extraction module (MLGFEM) integrates convolutional neural networks(CNNs), Transformers, and multiscale feature extraction strategies, modeling non-local and complementary information at multiple scales. Finally, the deep cross-attention fusion module (DCAFM) employs deep cross-attention mechanism to model the correlation between HSI and MSI, promoting the comprehensive fusion of spatial-spectral features. To validate the effectiveness and superiority of MCA-Net, we conducted comparative experiments on five widely used HSI datasets, including Pavia Centre, Pavia University, Washington DC, Botswana, and Chikusei. Experimental results demonstrate significant improvements over state-of-the-art fusion methods. For instance, on the Washington DC dataset, compared with the state-of-the-art method among the comparison algorithms, our method improves PSNR by 11.76%, and reduces RMSE, ERGAS, and SAM by 44.4%, 44.71%, and 43.2%, respectively.

Article activity feed