AdverFuse: Robust fusion of multimodal images based on dynamic attention and adversarial learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Current multimodal image fusion methods typically cannot precisely locate and fuse features of key regions, leading to artifacts in fusion results and relatively weak local feature expression, failing to ensure the integrity of image structures or target contours. To address this issue, we propose a universal fusion framework called AdverFuse. Based on the Mamba module, this framework introduces an adaptive weight mixed attention mechanism module specifically targeting fusion artifact problems. The module enhances cross-modal feature complementarity through channel branches, precisely locates target regions using spatial attention, and adjusts modal contributions based on the confidence scores output by spatial and channel attention mechanisms, making feature enhancement more aligned with the characteristics of different modal data. Additionally, we design an adversarial network feature enhancement registration module to improve the local feature expression capability of fused images. Combined with the adversarial training mechanism of the discriminator, this module adaptively balances the contribution weights of modalities such as infrared and visible light, maps features of different modalities to a unified semantic space, and extracts richer semantic features while preserving local detail information. We design a convolutional attention module to achieve more comprehensive feature interaction and reduce computational complexity.Experimental results across multiple datasets demonstrate that this method has significant advantages and outperforms SOTA methods.