Enhancing RGB-IR Object Detection: A Frozen Backbone Approach with Multi-Receptive Field Attention

Bingyu Lu
Haoyuan Liu
Hiroshi Watanabe

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent advancements in multimodal object detection have predominantly relied on end-to-end training paradigms, which, while effective, demand substantial computational resources and risk feature degradation. To address these challenges, we propose a frozen backbone paradigm, preserving pretrained representations as stable semantic anchors for effcient multimodal fusion. Our approach introduces a lightweight Multi-Receptive Field Attention (MRFA) mechanism, enhancing feature interaction and representation diversity without exhaustive retraining. Experiments on the FLIR Aligned and M3 FD datasets demonstrate consistent improvements over state-of-the-art end-to-end models, highlighting the potential of pretrained backbones coupled with adaptive attention mechanisms for robust multimodal object detection. The project code is released at https://github.com/LuBingyu11/MRFA.

Version published to 10.21203/rs.3.rs-7956977/v1 on Research Square
Nov 13, 2025

Lite-FARNet: A Light-weight Feedback Attention Residual Network for Efficient Multi-Class Segmentation in Complex Urban Scenes

This article has 3 authors:
1. Jiaxi Yang
2. Jiaquan Shen
3. Shitong Wang
This article has no evaluationsLatest version Dec 23, 2025
Two-Stage Fine-Tuning of Large Vision-Language Models with Hierarchical Prompting for Few-Shot Object Detection in Remote Sensing Images

This article has 7 authors:
1. Yongqi Shi
2. Ruopeng Yang
3. Changsheng Yin
4. Yiwei Lu
5. Bo Huang
6. Yu Tao
7. Yihao Zhong
This article has no evaluationsLatest version Jan 14, 2026
TriORU2-Net++: Attention-Guided Three-StageU2-Net++ for Light Field Occlusion Removal

This article has 5 authors:
1. Mostafa Farouk Senussi
2. Mahmoud Abdalla
3. Mahmoud SalahEldin Kasem
4. Mohamed Mahmoud
5. Hyun-Soo Kang
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Lite-FARNet: A Light-weight Feedback Attention Residual Network for Efficient Multi-Class Segmentation in Complex Urban Scenes

Two-Stage Fine-Tuning of Large Vision-Language Models with Hierarchical Prompting for Few-Shot Object Detection in Remote Sensing Images

TriORU2-Net++: Attention-Guided Three-StageU2-Net++ for Light Field Occlusion Removal