MMT-NET: A Lightweight Multimodal Fusion Network forUAV Target Detection in Adverse Environments

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In response to the challenge of insufficient target detection accuracy for UAVs in adverse environments(such as low light, dense fog, and extreme weather), this paper proposes a lightweight multi-modalfusion network, MMT-NET, for UAV target detection in such conditions. The network significantlyimproves detection performance in complex environments by fusing the complementary characteristicsof infrared and visible light images, combined with a lightweight design. MMT-NET is based on the RT-DETR framework and uses MobileNetV4 as the backbone network. It extracts infrared and visible lightfeatures through independent paths and incorporates a Lightweight Multi-modal Feature Interaction(LMFIM) module to enable feature interaction between the modalities. Additionally, a lightweightcross-modal attention fusion module (LCMAF) is designed, which performs feature fusion using aspatial attention mechanism, while reducing computational complexity. Experimental results show thaton the public multi-modal dataset M3FD, MMT-NET achieves mAP50 and mAP50:95 scores of 89.9%and 60.3%, respectively, and satisfies real-time inference on the NVIDIA Jetson TX2 NX platform.

Article activity feed