Real-Time Tiny Object Detection in UAV Aerial Images with Multi-Scale Attention Fusion
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
With the rapid advancement of unmanned aerial vehicle (UAV) technology, the demand for efficient and accurate object detection algorithms has become increasingly urgent. However, UAV aerial images present numerous challenges, including irregular target shapes, frequent occlusion, and stringent real-time requirements. These factors limit the performance of existing detection algorithms in practical applications. To address these issues, this paper proposes MCFA-Net, a multi-scale contextual feature aggregation network that integrates Transformer and Convolutional Neural Network (CNN) techniques. Specifically, YOLOv8 serves as the backbone, enhanced with an Attention-based Intrascale Feature Interaction (AIFI) module that leverages self-attention mechanisms to improve small object recognition across different scales. In the neck, a lightweight multi-resolution feature pyramid network (MRFPN) is designed to strengthen multi-scale feature fusion, while the Dynamic Detection Head (DyHead) incorporates adaptive attention to enhance robustness in dense and small-object scenarios. Comprehensive experiments conducted on the VisDrone2019 dataset, including ablation studies, comparative analyses, and interpretability evaluations, demonstrate the effectiveness of the proposed method. MCFA-Net achieves notable improvements, raising mAP@0.5 and mAP@0.5: 0.95 by 21\% and 23.4\%, respectively, while also increasing the inference speed from 105 FPS to 118 FPS. Furthermore, validation on the AI-TOD dataset confirms the robustness and generalization capability of the model. \newline\textnormal{Keywords:} tiny object detection; UAV imagery; multi-scale feature fusion; MCFA-Net