ECBAM-CVT-SOD: An Enhanced YOLOv8 Architecture with Multimodal Attentional Fusion for Innovative Low-altitude Remote Sensing in Small Object Detection

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper addresses the technical challenges of low recognition accuracy and high miss rates in small object detection within UAV aerial images by proposing YOLOv8s-SOD, a deeply optimized model based on the YOLOv8 framework. At the algorithmic architecture level, three core innovative modules are systematically developed: First, an Enhanced Convolutional Block Attention Module (ECBAM) with dual-channel optimization is introduced during the feature extraction stage. Through the synergistic coupling of channel recalibration and spatial attention mechanisms, this module significantly enhances the network's ability to characterize low signal-to-noise ratio small targets. Second, a Cross-layer Deformable Feature Interaction Module (CVT) is designed for the feature fusion stage. By employing dynamic deformable convolution to construct multi-scale feature correlation matrices, this module effectively resolves the insufficient information fusion caused by drastic target scale variations in aerial scenarios.Third, a High-dimensional Feature Analytical Detection Head (SOD-H) is innovatively developed based on the geometric and semantic characteristics of small targets. This component optimizes spatial-semantic information capture through nonlinear mapping relationships in high-dimensional feature spaces. To validate the effectiveness of these improvements, a systematic verification framework is implemented: 1) Ablation experiments using the controlled variable method quantitatively analyze the collaborative optimization effects and individual contributions of each module.2) Feature visualization through Class Activation Mapping (CAM) techniques provides cognitive science insights into enhanced attention allocation and feature focusing capabilities.3) Comprehensive benchmark comparisons are conducted on both the standard VisDrone UAV dataset and a self-constructed complex scenario dataset.Experimental results demonstrate that compared to the baseline model, YOLOv8s-SOD achieves a 4% improvement in mAP@0.5 while exhibiting superior robustness in extreme scale-variation scenarios.

Article activity feed