Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction

Jiaxi Yang
Jiaquan Shen
Shitong Wang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Pedestrian object detection is crucial in intelligent systems such as traffic management and surveillance. Traditional machine learning methods have shown drawbacks, including low accuracy and slow processing. Convolutional Neural Network (CNN)-based algorithms have achieved notable progress, but mainstream CNNs still struggle with slow speed and low accuracy, particularly for small and occluded targets from aerial perspectives. In this paper, we propose a Multi-Scale Attention YOLO (MSA-YOLO) algorithm to address these issues. MSA-YOLO incorporates a Squeeze, Excitation, and Cross Stage Partial (SECSP) channel attention module to extract richer pedestrian features with minimal additional parameters. A multi-scale prediction module is also introduced to capture information across different scales, improving small object detection and reducing missed detections. To evaluate our approach, we manually collect and annotate the Aerial Pedestrian Dataset (AP Dataset), which, to our knowledge, provides more annotations, varied scenes, and diverse view angles than comparable existing datasets. The high-resolution images in the AP Dataset allow for capturing more detailed pedestrian features, which can enhance model performance. Experimental results show that, on the AP dataset, MSA-YOLO demonstrates clear advantages over several widely used object detection and pedestrian detection models developed in recent years, indicating its potential dual benefits in terms of accuracy and efficiency.

Version published to 10.21203/rs.3.rs-7333218/v1 on Research Square
Aug 27, 2025

Real-Time Tiny Object Detection in UAV Aerial Images with Multi-Scale Attention Fusion

This article has 4 authors:
1. Junming Gao
2. Yanshan Zhang
3. Yuanzhang Fan
4. Bao Tian
This article has no evaluationsLatest version Sep 8, 2025
RoadNet: A High-Precision Transformer-CNN Framework for Road Defect Detection via UAV-Based Visual Perception

This article has 4 authors:
1. Long Gou
2. Yadong Liang
3. Xingyu Zhang
4. Jianfeng Yang
This article has no evaluationsLatest version Oct 9, 2025
Urban Road Defect Detection: A Hybrid EfficientNetV2-B0 and CBAM Framework with Real-Time Computer Vision Optimization

This article has 5 authors:
1. Sarah Ezz
2. Nashaat M. Hussain Hassan
3. Ayman Mahmoud Othman
4. Ahmed Monier
5. Ahmed Ehab
This article has no evaluationsLatest version Sep 1, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Real-Time Tiny Object Detection in UAV Aerial Images with Multi-Scale Attention Fusion

RoadNet: A High-Precision Transformer-CNN Framework for Road Defect Detection via UAV-Based Visual Perception

Urban Road Defect Detection: A Hybrid EfficientNetV2-B0 and CBAM Framework with Real-Time Computer Vision Optimization