ARF-YOLO: Attention-Guided Adaptive Resolution-Aware Feature Learning for UAV Remote Sensing Object Detection

Long Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Unmanned aerial vehicle (UAV)-based remote sensing object detection faces three fundamental bottlenecks: (1) insufficient resolution diversity in single-scale detection heads, causing irreversible spatial detail loss for small targets; (2) semantic gap accumulation in multi-scale feature fusion due to content-agnostic bilinear interpolation; and (3) inefficient feature resource allocation that treats all channels, spatial patches, and scale levels with equal importance regardless of relevance. To address these challenges, we propose ARF-YOLO, a novel UAV detection framework built upon YOLOv11 with three synergistic innovations. The Attention-Guided Resolution Head (AGRH) incorporates the Multi-Perspective Feature Attention (MPFA) module, which simultaneously processes dual-resolution feature streams through multi-directional pooling-based attention to fuse semantic context and fine-grained spatial cues. The Adaptive Multi-Level Feature Fusion Module (AMFF) replaces bilinear upsampling with content-adaptive dynamic kernel generation (FAUS), structure-guided feature refinement (FRS), and learning-based cross-level weighting (AFFS). The Fast Scale Resource Assigner (FSRA), adopted from the global dynamic query framework for small target detection, is incorporated into our pipeline to dynamically allocate representation capacity along channels, spatial patches, and scale levels via three lightweight parallel assigners. We further propose the ARF-Scale-Aware Loss, which amplifies supervisory signal for small objects through inverse-scale weighting. Extensive experiments on VisDrone2019 and UAVDT demonstrate that ARF-YOLO achieves 48.5% and 63.7% mAP@0.5 respectively, surpassing the YOLOv11 baseline by 5.1 and 5.4 percentage points with only \((+)\)2.3\,M additional parameters (11.5% relative increase) while maintaining real-time inference at 101\,fps.

Version published to 10.21203/rs.3.rs-9299659/v1 on Research Square
Apr 10, 2026

Multi-Scale Contextual Attention for Robust Crop and Pest Image Classification

This article has 4 authors:
1. Muhammad Majid
2. Hassan Tariq
3. Imran Mumtaz
4. Hanan Aljuaid
This article has no evaluationsLatest version Apr 28, 2026
Towards a general Detector of terrestrial Arthropods in Natural backgrounds

This article has 6 authors:
1. Edgar Remy
2. Axel Carlier
3. Elodie Massol
4. Rahim Kacimi
5. Alexis S. Chaine
6. Maxime Cauchoix
This article has no evaluationsLatest version May 8, 2026
Self-supervised Internal Learning Enhances Isotropic Resolution for Three-dimensional Fluorescence Microscopy

This article has 13 authors:
1. Mingzhe Wei
2. Pengcheng Xu
3. Junyu Liu
4. Xuesong Li
5. Xuhui Feng
6. Jun Zhu
7. Renwei Dong
8. Hengjia Ran
9. Wentao Zhu
10. Yubing Han
11. Yue Li
12. Min Guo
13. Huafeng Liu
This article has no evaluationsLatest version Jun 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-Scale Contextual Attention for Robust Crop and Pest Image Classification

Towards a general Detector of terrestrial Arthropods in Natural backgrounds

Self-supervised Internal Learning Enhances Isotropic Resolution for Three-dimensional Fluorescence Microscopy