SRTSOD-YOLO: Stronger Real-Time Small Object Detection Algorithm Based on Improved YOLO11 for UAV Imageries
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Aiming at the problems of small target detection feature extraction difficulty, complex background interference, high missed detection rate and high real-time requirements in aerial images of UAVs, this paper proposes an innovative SRTSOD-YOLO series model based on YOLO11 model. This model integrates a Multi-scale Feature Complementary Aggregation Module (MFCAM) in the backbone network. MFCAM is committed to al-leviating the problem of difficult feature extraction caused by the loss of small target information with the increase of network depth. Through the combination of channel and spatial attention mechanism and convolutional feature extraction of different scales, the position of small objects in the image can be effectively captured. In addition, we design a new neck architecture, called Gated Activation Convolutional Fusion Pyramid Network (GAC-FPN), which efficiently highlights important features and suppresses irrelevant background information during multi-scale feature fusion. GAC-FPN uses three main strategies to enhance small target detection performance: adding a detector head with a small receptive field while deleting the detector head with the original largest receptive field, making full use of large-scale features, and using gated activation convolutional module. Aiming at the imbalance of positive and negative samples in the image, the adaptive threshold focus loss function is used to replace the original binary cross-entropy loss function in the detection head, which speeds up the convergence speed of the network. In addition, in order to adapt to different application scenarios, we generate different versions of SRTSOD-YOLO by setting different widths and depths of network modules: small model (SRTSOD-YOLO-n), smaller model (SRT-SOD-YOLO-s), medium model (SRTSOD-YOLO-m), and large model (SRT-SOD-YOLO-l). The experimental results on two datasets, VisDrone2019 and UAVDT, showed that SRTSOD-YOLO-n improved the mAP50 index by 3.1% and 1.2% compared to YOLO11n, and SRTSOD-YOLO-l improved the mAP50 index by 7.9% and 3.3% compared to YOLO11l, respectively. Compared with other existing methods, SRT-SOD-YOLO-l achieves the highest detection accuracy while maintaining real-time performance, demonstrating the superiority of the proposed method.