DASYOLO: Dual-Attention-Synergistic YOLO for Cross-Modality Object Detection

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The fusion of infrared and visible images effectively overcomes the limitations of single modalities in object detection, demonstrating significant advantages in adverse environments such as low illumination and haze conditions. However, existing cross-modal object detection methods predominantly employ sequential fusion strategies in attention mechanism design, resulting in limited feature representation capabilities and computational inefficiency. To enhance the feature representation capability of detection models and improve computational efficiency, we innovatively introduce a synergistic mechanism to design a dual attention synergy cross-modal object detection network called DASYOLO. This network integrates a shallow feature enhancement module (BiAttention) and a cross-modal synergy attention module (DAS). First, the BiAttention module exploits the expression potential of shallow features through two complementary attention mechanisms, providing a robust data foundation for cross-modal feature synergistic interaction. Subsequently, considering the semantic complementarity of different feature channels in spatial distribution, the cross-modal synergy attention module (DAS) adopts a parallel architecture to simultaneously capture channel importance and spatial significance, achieving complementary advantages of both attention mechanisms through interactive learning strategies. This design provides spatial guidance and alleviates semantic differences, enhancing the model's feature discrimination capability and inference efficiency. Qualitative and quantitative results demonstrate that the proposed method achieves favorable computational efficiency while maintaining high detection accuracy, significantly improving the overall performance of cross-modal object detection.

Article activity feed