Enhancing RGB-IR Object Detection: A Frozen Backbone Approach with Multi-Receptive Field Attention

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent advancements in multimodal object detection have predominantly relied on end-to-end training paradigms, which, while effective, demand substantial computational resources and risk feature degradation. To address these challenges, we propose a frozen backbone paradigm, preserving pretrained representations as stable semantic anchors for effcient multimodal fusion. Our approach introduces a lightweight Multi-Receptive Field Attention (MRFA) mechanism, enhancing feature interaction and representation diversity without exhaustive retraining. Experiments on the FLIR Aligned and M3 FD datasets demonstrate consistent improvements over state-of-the-art end-to-end models, highlighting the potential of pretrained backbones coupled with adaptive attention mechanisms for robust multimodal object detection. The project code is released at https://github.com/LuBingyu11/MRFA.

Article activity feed