Energy-Efficient and Adversarially-Resilient Underwater Object Detection via Adaptive Vision Transformers
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Underwater object detection is critical for marine resource utilization, ecological monitoring, and security, yet remains limited by optical degradation, high energy consumption, and adversarial vulnerabilities. To overcome these challenges, we propose an Adaptive Vision Transformer (A-ViT) detection framework that integrates hardware optimization, image enhancement, and lightweight detection. At the hardware level, a systematic approach to power modeling, endurance estimation, and material selection ensures feasibility across shallow- to deep-water missions. Image quality is improved via HAT-based super-resolution and DICAM staged enhancement, while detection accuracy is boosted through an improved YOLOv11-CA_HSFPN with coordinate attention and high-order feature fusion. A-ViT further employs dynamic token pruning and early-exit strategies to reduce latency and memory usage, combined with fallback and key-region preservation for robustness under attacks. Additionally, an Image-stage Attack QuickCheck (IAQ) module identifies adversarial frames and redirects them to a baseline detector, mitigating SlowFormer-induced latency and memory overflow. Experiments demonstrate superior performance in image quality metrics (UIQM, UCIQE), detection accuracy (mAP), and efficiency–robustness trade-offs, providing a practical foundation for next-generation underwater perception systems.