Fusion Guard: A Multi-Scale Sequential Fusion Framework for Small Target Detection in Unmanned Aerial Vehicle Scenarios Using YOLO-World

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper presents a multi-scale sequence fusion framework, Fusion Guard, built upon YOLO-World to enhance small target detection from UAV scenarios. To overcome the limitations of traditional models in feature convergence and detail preservation, Fusion Guard incorporates three key modules: Triple Feature Encoding, Scale Sequence Feature Fusion, and Feature Sum Aggregator, which collectively strengthen feature extraction and integration. Moreover, Selective Boundary Aggregation utilizes a bidirectional dynamic fusion approach to enhance feature complementarity. In addition, the integration of the small object detection layer with high-resolution features significantly improves detection performance. To further improve target localization, the WIoU v3 loss function is incorporated into the model. The experimental results indicate that the model presented in this study achieves a 5.7% increase in mAP@0.5 and a 3.2% improvement in mAP@0.5:0.95 on the VisDrone2019 dataset. Additionally, precision and recall improve by about 5%. On the DOTA dataset, the model achieves notable performance enhancements 1 while reducing parameters by 0.5M and model size by 0.7MB, and it also supports customizable detection categories. In general, Fusion Guard demonstrates excellent performance and versatility in detecting small objects across various tasks.

Article activity feed