AdverFuse: Robust fusion of multimodal images based on dynamic attention and adversarial learning

Fangyan Zhang
Fan Zhang
Yingbing Liu
Fei Ma
Chunsheng Hu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Current multimodal image fusion methods typically cannot precisely locate and fuse features of key regions, leading to artifacts in fusion results and relatively weak local feature expression, failing to ensure the integrity of image structures or target contours. To address this issue, we propose a universal fusion framework called AdverFuse. Based on the Mamba module, this framework introduces an adaptive weight mixed attention mechanism module specifically targeting fusion artifact problems. The module enhances cross-modal feature complementarity through channel branches, precisely locates target regions using spatial attention, and adjusts modal contributions based on the confidence scores output by spatial and channel attention mechanisms, making feature enhancement more aligned with the characteristics of different modal data. Additionally, we design an adversarial network feature enhancement registration module to improve the local feature expression capability of fused images. Combined with the adversarial training mechanism of the discriminator, this module adaptively balances the contribution weights of modalities such as infrared and visible light, maps features of different modalities to a unified semantic space, and extracts richer semantic features while preserving local detail information. We design a convolutional attention module to achieve more comprehensive feature interaction and reduce computational complexity.Experimental results across multiple datasets demonstrate that this method has significant advantages and outperforms SOTA methods.

Version published to 10.21203/rs.3.rs-8381183/v1 on Research Square
Dec 24, 2025

<p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">AttnLink: Enhancing Cross-Modal Fusion for Robust Image-to-PointCloud Place Recognition

This article has 2 authors:
1. Ziyu Fang
2. Minghao Ye
This article has no evaluationsLatest version Jan 14, 2026
FCL: Frequency-based Contrastive Learning for Generalizable Face Forgery Detection

This article has 5 authors:
1. Yu Zhu
2. Shengze Wang
3. Yufeng Gu
4. Ziming Zhu
5. Nan Wang
This article has no evaluationsLatest version Dec 25, 2025
A Systematic Review of Wavelet-Based Pooling for Enhancing GANs: Central Trends, Auxiliary Pooling Strategies, and Complementary Wavelet Integrations

This article has 3 authors:
1. Shimaa Elbana
2. Ahmad Al-Kabbany
3. Said El-Khamy
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

<p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">AttnLink: Enhancing Cross-Modal Fusion for Robust Image-to-PointCloud Place Recognition

FCL: Frequency-based Contrastive Learning for Generalizable Face Forgery Detection

A Systematic Review of Wavelet-Based Pooling for Enhancing GANs: Central Trends, Auxiliary Pooling Strategies, and Complementary Wavelet Integrations