Scalable Mixture-of-Experts Attention Feature Pyramid Network for Detection and Segmentation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Attention mechanisms have been widely used to address the common multi-scale challenges in remote sensing imagery for tasks such as object detection and instance segmentation. However, they often require the design of task-specific network architectures tailored to different data targets. To better accommodate objects of varying scales and downstream task requirements, we propose a dynamic search framework based on a Mixture-of-Experts model. In this framework, each module is treated as an expert, and their combinations are flexibly adjusted to ensure that the pyramid structure adapts to the needs of different-scale tasks. Specifically, we design a Scalable Bi-Directional Feature Pyramid Network (SBFPN), which incorporates various hybrid attention mechanisms to dynamically enhance feature fusion across different layers. This approach not only captures long-range dependencies within the image but also suppresses noise and interference from complex backgrounds. The scalable feature pyramid structure we construct utilizes sparse feature fusion within the same layers, while also computing self-attention weights between nodes and retaining or trimming skip connections across layers. We conduct experiments on object detection, instance segmentation, and panoptic segmentation tasks. This structure can be transferred to YOLO-based and R-CNN-based networks, achieving superior detection and segmentation performance. On the Airbus Ship dataset, the mAP for detection and segmentation increased from 71.3% and 62.4% to 82.7% and 71.1%, respectively, demonstrating the effectiveness of our proposed method. Code is available at https://github.com/chaibosong/SAFPN.