Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Pedestrian object detection is crucial in intelligent systems such as traffic management and surveillance. Traditional machine learning methods have shown drawbacks, including low accuracy and slow processing. Convolutional Neural Network (CNN)-based algorithms have achieved notable progress, but mainstream CNNs still struggle with slow speed and low accuracy, particularly for small and occluded targets from aerial perspectives. In this paper, we propose a Multi-Scale Attention YOLO (MSA-YOLO) algorithm to address these issues. MSA-YOLO incorporates a Squeeze, Excitation, and Cross Stage Partial (SECSP) channel attention module to extract richer pedestrian features with minimal additional parameters. A multi-scale prediction module is also introduced to capture information across different scales, improving small object detection and reducing missed detections. To evaluate our approach, we manually collect and annotate the Aerial Pedestrian Dataset (AP Dataset), which, to our knowledge, provides more annotations, varied scenes, and diverse view angles than comparable existing datasets. The high-resolution images in the AP Dataset allow for capturing more detailed pedestrian features, which can enhance model performance. Experimental results show that, on the AP dataset, MSA-YOLO demonstrates clear advantages over several widely used object detection and pedestrian detection models developed in recent years, indicating its potential dual benefits in terms of accuracy and efficiency.