VPU-RTDETR: A Lightweight, Self-Adaptive, Real-Time Model for Small Object Detection on UAVs
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Small target detection in UAV aerial images faces significant challenges due to low resolution, complex backgrounds, and scale variations. To address issues in existing RT-DETR, namely insufficient feature extraction for small targets, inadequate capture of local information by the attention mechanism, and low sensitivity of the loss function, this paper proposes a lightweight and adaptive detection model named VPU-RTDETR. In the backbone network, the VASM module is introduced to achieve dynamic fusion of multi-scale features; in the encoder, the AIFI-Pola module is employed to simultaneously enhance global and local features via a polarized linear attention mechanism; during the feature fusion stage, the USOS scheme is designed, utilizing SPDConv and C-OKM modules to improve the utilization of low-resolution features. Additionally, a hybrid loss function based on FocalerIoU and MPDIoU is constructed to effectively improve the localization accuracy of small targets. Experimental results demonstrate that, compared with the baseline model, VPU-RTDETR achieves a 3.1% improvement in mAP50 and a 2.4% improvement in mAP50:95 on the VisDrone2019 dataset, while maintaining 64 FPS real-time performance and a relatively low parameter count, thereby demonstrating a high cost-performance advantage for detection on UAV platforms.