ESAK-YOLO: Multi-Scale Attention Enhanced Object Detection for Floating Waste in Complex Water Scenes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In complex aquatic environments, accurate detection of floating debris remains challenging due to severe occlusion, large scale variations, and strong background interference, which hinder existing object detection models from effectively modeling spatial features, adapting to scale changes, and capturing object shapes. To address the weak multi-scale feature representation of surface debris, an efficient multi-scale attention (EMA) module is designed, enhancing semantic modeling capability through grouped feature modeling and cross-spatial information fusion. To improve the perception of objects of different sizes, a selective kernel attention (SKA) mechanism is designed, enabling the network to dynamically adjust its receptive field based on contextual information and accurately capture scale-variant features. Considering the irregular morphology of floating debris, an arbitrary kernel convolution (AKConv) module is proposed, breaking the constraints of traditional kernel shapes and improving adaptability to complex structures. Integrating these three core designs, the enhanced network is termed ESAK-YOLO. The experimental results show that, compared with the baseline YOLOv7, ESAK-YOLO improves mAP by 8.14\%, and when compared with five other models, it achieves a maximum improvement of 13.78\%, further demonstrating the effectiveness and superiority of the proposed method.