DASYOLO: Dual-Attention-Synergistic YOLO for Cross-Modality Object Detection

Yunjia Yang
Xiaoxia Wang
Fengbao Yang
Weiwei Du
Bo Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The fusion of infrared and visible images effectively overcomes the limitations of single modalities in object detection, demonstrating significant advantages in adverse environments such as low illumination and haze conditions. However, existing cross-modal object detection methods predominantly employ sequential fusion strategies in attention mechanism design, resulting in limited feature representation capabilities and computational inefficiency. To enhance the feature representation capability of detection models and improve computational efficiency, we innovatively introduce a synergistic mechanism to design a dual attention synergy cross-modal object detection network called DASYOLO. This network integrates a shallow feature enhancement module (BiAttention) and a cross-modal synergy attention module (DAS). First, the BiAttention module exploits the expression potential of shallow features through two complementary attention mechanisms, providing a robust data foundation for cross-modal feature synergistic interaction. Subsequently, considering the semantic complementarity of different feature channels in spatial distribution, the cross-modal synergy attention module (DAS) adopts a parallel architecture to simultaneously capture channel importance and spatial significance, achieving complementary advantages of both attention mechanisms through interactive learning strategies. This design provides spatial guidance and alleviates semantic differences, enhancing the model's feature discrimination capability and inference efficiency. Qualitative and quantitative results demonstrate that the proposed method achieves favorable computational efficiency while maintaining high detection accuracy, significantly improving the overall performance of cross-modal object detection.

Version published to 10.21203/rs.3.rs-7658789/v1 on Research Square
Oct 10, 2025

DMSCA: Dynamic Multi-Scale Channel-Spatial Attention for Enhanced Feature Representation in Convolutional Neural Networks

This article has 3 authors:
1. Zong Li
2. Jun Peng Hu
3. Ji Nan Shen
This article has no evaluationsLatest version Nov 12, 2025
Enhancing Multimodal Recommendation via Contrastive Self-Supervised Modality-Preserving Learning

This article has 2 authors:
1. Jiajie Lu
2. Yamashita Haruka
This article has no evaluationsLatest version Oct 27, 2025
DCD-YOLO: An Improved YOLOv11n Algorithm \for Traffic Participant Detection

This article has 6 authors:
1. Xiaohui Lu
2. Dexin Wang
3. Ruixia Xiong
4. Xinzhan Lv
5. Yichong Chen
6. Shaosong Li
This article has no evaluationsLatest version Oct 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DMSCA: Dynamic Multi-Scale Channel-Spatial Attention for Enhanced Feature Representation in Convolutional Neural Networks

Enhancing Multimodal Recommendation via Contrastive Self-Supervised Modality-Preserving Learning

DCD-YOLO: An Improved YOLOv11n Algorithm \for Traffic Participant Detection