DCA-CL: Enhancing Multimodal Emotion Recognition via Dual Cross Attention and Contrastive Learning

Xin Wang
Shubo Liu
Hongshe Dang
Longlong Qiao
Hongnian Yu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Emotion is a subjective human response to external events or stimuli and plays a crucial role across various application domains. Consequently, emotion recognition has become a central focus of research. However, existing mainstream approaches still face several challenges, such as limited interaction across different modalities and low recognition accuracy when dealing with limited samples involving semantically similar but categorically distinct emotions. To tackle these challenges, we introduce a new multimodal emotion recognition framework, named DCA-CL (Dual Cross Attention with Contrastive Learning), which aims to improve the integration and effectiveness of cross-modal information. The proposed model incorporates a feature fusion network that combines bidirectional cross-modal attention with self-attention mechanisms, enabling effective modeling of both intra-modal and cross-modal interactions. Furthermore, a temporal gating mechanism is adopted to filter salient features and suppress redundant information, while dynamic weight allocation facilitates efficient fusion of modality-specific features. During the training phase, a dynamic modal distillation mechanism is introduced to dynamically select the optimal teacher mode based on modal quality, guiding weak modes to learn high-quality semantic features and enhance their ability to represent weak modes;To enhance recognition accuracy in few-shot settings and among semantically close emotion categories, we incorporate a dynamic focal contrastive loss, which boosts the model’s ability to learn discriminative representations. Experiments conducted on the IEMOCAP and MELD datasets confirm that the proposed DCA-CL framework delivers outstanding overall performance.

Version published to 10.21203/rs.3.rs-7251362/v1 on Research Square
Aug 27, 2025

DDCAF: Dynamic Dual Cross-Attention Fusion Framework for Multimodal Hate Speech Detection

This article has 3 authors:
1. Gauri Kitukale
2. Navneet Pratap Singh
3. Sidharth Quamara
This article has no evaluationsLatest version Sep 15, 2025
A Deep Learning Framework for Emotion Recognitionin Music Using Multimodal Data Fusion

This article has 1 author:
1. Runhua Li
This article has no evaluationsLatest version Sep 19, 2025
Deep Temporal Features and Multi-Level Cross-Modal Attention Fusion for Multimodal Sentiment Analysis

This article has 1 author:
1. Min Zhu
This article has no evaluationsLatest version Sep 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DDCAF: Dynamic Dual Cross-Attention Fusion Framework for Multimodal Hate Speech Detection

A Deep Learning Framework for Emotion Recognitionin Music Using Multimodal Data Fusion

Deep Temporal Features and Multi-Level Cross-Modal Attention Fusion for Multimodal Sentiment Analysis