Vehicle detection algorithm based on improved RT-DETR
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Vehicle detection algorithms are integral to intelligent traffic management and AI-assisted driving systems. However, the complexity and variability of traffic scenarios present significant challenges. Spatial pyramid pooling enhances a model’s ability to capture multi-scale contextual information by offering different levels of feature representation, which is particularly beneficial in complex traffic scenarios. In this study, we extend spatial pyramid pooling to Transformer-based models and introduce a linear spatial pyramid attention (LSPA) mechanism. This approach allows the model to focus on both global information and local details when processing vehicle image data. Additionally, we propose a spatial coordination filter (SCF) module to explicitly summarize spatial global dependencies and reduce spatial feature redundancy. To address computational costs, we employ partial convolution (PConv) in the model’s backbone network to minimize redundant computation. Experiments conducted on the BDD100K and KITTI datasets demonstrate that our model improves the mean average precision (mAP@50) by 2.7% and reduces model parameters by approximately 16% compared to the existing RT-DETR model. These findings confirm the efficacy and advantages of our proposed method.