MambaRetinaNet: A Multi-Scale Convolution and Mamba Fusion-Based Remote Sensing Object Detection Model

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Affected by complex backgrounds and multi-scale object characteristics, object detection in remote sensing images faces significant challenges in accuracy. Despite advancements in the methods utilizing convolutional neural networks (CNN) and self-attention, they encounter two fundamental challenges: CNNs are restricted by their limited receptive fields, giving rise to inadequate global feature representation, whereas self-attention mechanisms, while adept at capturing long-range dependencies, suffer from heightened computational complexity that hampers practical application efficiency and may diminish the representation of local detail features. To resolve these challenges, this article proposed an innovative CNN-Mamba fusion-based detection model —MambaRetinanet— which uses a well-designed synergistic perception module (SPM) to efficiently model the global information and enhance the extraction of local features. In addition, for improving the feature pyramid network (FPN), we introduced a differentiated feature processing strategy and designed an asymmetric feature pyramid—MambFPN—based on this strategy to balance detection accuracy and computational efficiency. The experimental results indicate that MambaRetinanet has significant advantages on four mainstream remote sensing datasets: the mean Average precision (mAP) on DOTA-v1.0, DOTA-v1.5, DOTA-v2.0 and DIOR-R datasets reached 77.50, 70.21, 57.17 and 71.50 respectively, which is an average increase of 11% in comparison to that of the baseline. Notably, on the DOTA-v2.0 dataset, MambaRetinanet demonstrates advantages over the current one stage SOTA model, enhancing mAP scores by approximately 2 percentage points, thereby validating the efficacy and generalizability of the MambaRetinaNet in complex remote sensing scenarios.

Article activity feed