Enhancing Infrared-Visible Image Fusion via Text-Guided Adaptive Feature Integration

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Image fusion techniques aim to integrate complementary information from multiple modalities, such as infrared and visible images, to generate enhanced images that preserve both texture details and salient targets. Traditional methods often overemphasize low-level visual features, neglecting high-level semantic information, which limits their performance in downstream applications. This paper proposes a text-guided adaptive fusion network that incorporates language-based textual descriptions during feature extraction to capture semantic information effectively. An Adaptive Attention Fusion module dynamically integrates critical features from both modalities, while a simplified ResFormer module enhances the network’s ability to perceive local details and global structures. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in both subjective visual quality and objective metrics, achieving significant improvements in high-level vision tasks such as semantic segmentation and object detection (e.g., a 8% increase in mIoU for semantic segmentation on the MSRS dataset). Our findings underscore the potential of text-guided fusion networks in advancing image fusion technology. The code and datasets are available at https://github.com/VCMHE/TGAF.

Article activity feed