DHAFGan: A Dense Hybrid Attention Fusion Generative Adversarial Network for Infrared and Visible Image Fusion
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Aiming at the problems existing in the current infrared and visible light image fusion algorithms, such as insufficient perception of typical features, poor visual representation of the fusion results, and insufficient utilization of important secondary information, this paper proposes an infrared and visible light image fusion algorithm based on shallow-deep feature extraction and dual-channel hybrid attention. Firstly, a shallow-deep feature extraction module is constructed. This module utilizes shallow convolutional layers and deep multi-scale receptive field units to extract surface-level features and deep semantic information from the source images, respectively, thereby achieving multi-level multimodal feature extraction. Secondly, Dual-Channel Hybrid Attention Fusion Module (DCAFM) is constructed. Spatial attention is focused on the salient areas of the image, and channel attention is used to strengthen the feature channels, thereby enhancing the fusion ability of multimodal features. Finally, primary and secondary feature loss functions are formulated to constrain both the generator and discriminator, facilitating the extraction of latent secondary feature information from the source images. Experimental results on the DroneVehicle dataset demonstrate that the proposed algorithm achieves superior performance in both subjective visual evaluation and objective metrics. Quantitative evaluations show that our method outperforms seven state-of-the-art approaches, achieving the highest scores in standard deviation (SD=9.3541), mutual information (MI=2.4321), and peak signal-to-noise ratio (PSNR=65.7852), while ranking second in average gradient (AG=3.9854). The fused images generated by our method not only align with human visual perception characteristics but also retain rich detailed information, effectively preserving both dominant and subtle features from the source modalities.