Robust Small Object Detection on Water Surfaces via Multi-Scale Contextual Attention and Channel-Normalized Feature Aggregation

AnChuan Wang
Ling Qin
Qing Huang
Qun Zou

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Reliable perception of small floating objects is fundamental to the autonomous navigation of Unmanned Surface Vehicles (USVs) and Search and Rescue (SAR) operations. However, detection in dynamic water-surface environments remains a formidable challenge as high-frequency wave clutter frequently obscures targets and illumination-induced covariate shifts destabilize feature distributions. To address these inherent limitations, this study proposes YOLO11-MCN, a real-time detection framework introducing two novel architectural components tailored for water surface monitoring. First, the Multi-scale Contextual Attention (MSCA) module disentangles the target signatures from repetitive background noise. Unlike conventional attention mechanisms, the MSCA explicitly aggregates contextual information across heterogeneous receptive fields to suppress wave-generated false positives. Second, the Channel Normalization Attention Mechanism (CNAM) provides a targeted solution for illumination instability. Leveraging Group Normalization for feature statistics calibration, CNAM effectively mitigates covariate shifts from extreme lighting conditions. These core innovations are complemented by a high-resolution P2 detection head, recovering the geometric details of small-scale targets (\texorpdfstring{$<32 \times 32$}{<32 × 32} pixels) typically lost during deep downsampling. Extensive experiments on a dataset containing 5,812 images demonstrate that YOLO11-MCN achieves a state-of-the-art mAP@0.5 of 92.7\%, surpassing the YOLO11n baseline by 5.9 percentage points. Robustness evaluations confirm that the designs of MSCA and CNAM significantly reduce missed detections under severe wave clutter and backlighting conditions. With a recall of 90.5\% and an inference speed of 94 FPS, the proposed method provides a robust and efficient solution suitable for USV perception, with a model complexity (3.9M parameters) compatible with further edge-device optimization.

Version published to 10.21203/rs.3.rs-8913996/v1 on Research Square
Feb 26, 2026

Multiscale Feature Optimization for Accurate Small Object Detection in Remote Sensing Imagery

This article has 5 authors:
1. Bingxiang Wang
2. Mugen Zhou
3. Wenzhuo Ma
4. Tianyu Li
5. Changsheng Zhu
This article has no evaluationsLatest version Apr 17, 2026
ESO-YOLO: Enhanced Small Object Detection Algorithm from Multiple Perspectives

This article has 4 authors:
1. Dong Wu
2. Wenhao Guan
3. Bingjie Zhang
4. Hao Chen
This article has no evaluationsLatest version Apr 13, 2026
A feature enhancement and attention fusion network for small object detection in UAV imagery

This article has 4 authors:
1. Xilong Xu
2. Peng Li
3. Hongwei Ding
4. Jinhua Yang
This article has no evaluationsLatest version Mar 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multiscale Feature Optimization for Accurate Small Object Detection in Remote Sensing Imagery

ESO-YOLO: Enhanced Small Object Detection Algorithm from Multiple Perspectives

A feature enhancement and attention fusion network for small object detection in UAV imagery