Fusion Guard: A Multi-Scale Sequential Fusion Framework for Small Target Detection in Unmanned Aerial Vehicle Scenarios Using YOLO-World
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents a multi-scale sequence fusion framework, Fusion Guard, built upon YOLO-World to enhance small target detection from UAV scenarios. To overcome the limitations of traditional models in feature convergence and detail preservation, Fusion Guard incorporates three key modules: Triple Feature Encoding, Scale Sequence Feature Fusion, and Feature Sum Aggregator, which collectively strengthen feature extraction and integration. Moreover, Selective Boundary Aggregation utilizes a bidirectional dynamic fusion approach to enhance feature complementarity. In addition, the integration of the small object detection layer with high-resolution features significantly improves detection performance. To further improve target localization, the WIoU v3 loss function is incorporated into the model. The experimental results indicate that the model presented in this study achieves a 5.7% increase in mAP@0.5 and a 3.2% improvement in mAP@0.5:0.95 on the VisDrone2019 dataset. Additionally, precision and recall improve by about 5%. On the DOTA dataset, the model achieves notable performance enhancements 1 while reducing parameters by 0.5M and model size by 0.7MB, and it also supports customizable detection categories. In general, Fusion Guard demonstrates excellent performance and versatility in detecting small objects across various tasks.