Modality-Aware Adaptive-Integration Guided Single-Stream Network for RGB-T Saliency Detection

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

RGB-T saliency detection becomes gradually a hot topic in saliency detection field recently. However, existing works (especially CNN based methods) usually use the two-stream structure to separately extract saliency cues from RGB and thermal infrared images, and then integrate them into the final detection result, this strategy greatly increases parameters scale while multi-modal fusion results are also very sensitive to two modalities’ quality. Based on above observation, we develop a novel Modality-Aware Adaptive-Integration Guided Single-Stream Network(MAANet), to detect salient objects from RGB-T image pairs. The feature pyramid network(FPN) is adopted as the basic structure of our MAANet. In order to tactfully fuse two supplementary modalities: (1)In the encoder: RGB and thermal infrared images are concatenated into 4-channel input of encoder structure in the proposed MAANet. (2)In the decoder: We propose a novel Modality-Aware Adaptive-Integration based Attention mechanism (MAAM) to enable the decoder to optimally perform the fusion of two modalities, and produce more accurate saliency predictions. (3)Finally: A novel coarse-and-refined bidirectional optimization(CRBO) method is proposed to suppress irrelevant background regions of saliency map generated by decoder structure. The proposed MAANet could better take both advantages of two modalities and is not sensitive to any one modality compared to previous RGB-T methods, meanwhile, MAANet is also more lightweight than previous works.Extensive experiments demonstrate that the proposed model performs favorably against most state-of-the-art RGB-T methods under different evaluation metrics, even outperforms than most RGB and RGB-D methods.

Article activity feed