Cross-Modal Object Detection from UAVPerspectives with Frequency Domain Fusion andGated Feature Enhancement
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Drones can capture wide-area and multi-angle images from the air and are widely used in the field of target detection. However, in bad weather and complex lighting conditions, traditional single-spectrum visible light imaging is difficult to obtain effective features of the target. When detecting small and dense targets, it is prone to false detection or missed detection. This paper proposes a cross-modal object detection algorithm that combines frequency-domain fusion and gated feature enhancement. Based on the YOLOv12 backbone network, a dual-branch feature fusion network is constructed to extract the feature information of visible light images and infrared images respectively.Dynamic Fusion Mechanism of Frequency-Domain Differences (DFF) is designed to retain the high-frequency and low-frequency information of the images. By emphasizing the spatial dimension differences between the two spectral features, deep feature fusion is achieved; To enhance the detection efficiency, lightweight Scharr Gated Feature Enhancement (SGFE) is designed to replace the corresponding module in the neck network. Our experiments on the DroneVehicle dataset demonstrated the advantages of our method and the effectiveness of MEF-Net in enhancing multi-spectral feature extraction for ground detection by unmanned aerial vehicles.