Enhancing RGB-IR Object Detection: A Frozen Backbone Approach with Multi-Receptive Field Attention
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent advancements in multimodal object detection have predominantly relied on end-to-end training paradigms, which, while effective, demand substantial computational resources and risk feature degradation. To address these challenges, we propose a frozen backbone paradigm, preserving pretrained representations as stable semantic anchors for effcient multimodal fusion. Our approach introduces a lightweight Multi-Receptive Field Attention (MRFA) mechanism, enhancing feature interaction and representation diversity without exhaustive retraining. Experiments on the FLIR Aligned and M3 FD datasets demonstrate consistent improvements over state-of-the-art end-to-end models, highlighting the potential of pretrained backbones coupled with adaptive attention mechanisms for robust multimodal object detection. The project code is released at https://github.com/LuBingyu11/MRFA.