Disentangled Representation Learning with Temporal Smoothness Constraints for Multimodal Sentiment Analysis

Yihao Xu
Hai Huan

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The goal of multimodal sentiment analysis is to efficiently identify and interpret human emotions by integrating multiple modalities (e.g., text, audio, and video). Traditional representation learning techniques often fail to adequately address inter-modal heterogeneity and temporal continuity, particularly as multimodal sentiment analysis tasks grow in complexity. Consequently, these methods struggle to achieve effective cross-modal fusion while mitigating redundant information and noise interference. To address these challenges, we propose DRTSC, a novel multimodal sentiment analysis framework. First, the framework employs disentangled representation learning to extract shared and private features; introduces a temporal smoothness loss to enforce consistency in audio and video features; and incorporates adversarial loss with backward tuning. Second, a textual hierarchical guidance module coordinates audio and video emotional expressions by leveraging affective cues from text. Finally, efficient feature fusion is achieved through cross-modal interaction layers. Extensive experiments on CMU-MOSI and CMU-MOSEI benchmarks demonstrate that the proposed model achieves state-of-the-art performance in sentiment analysis tasks.

Version published to 10.21203/rs.3.rs-6419812/v1 on Research Square
May 14, 2025

Multigranular Unified Synthesis Encoder for Fine-grained Multimodal Emotion Understanding

This article has 3 authors:
1. Colton Ray
2. Wyne Nasir
3. Savannah Grace
This article has no evaluationsLatest version May 16, 2025
Attention Based Hybrid Deep Learning models for Multi class Amharic News Categorization with Explainable AI

This article has 4 authors:
1. Amlakie Aschale Alemu
2. Malefia Demilie Melese
3. Daniel Arega Mengesha
4. Misganaw Aguate Widneh
This article has no evaluationsLatest version Jun 9, 2025
GAME: Learning Multimodal Interactions via Graph Structures for Personality Trait Estimation

This article has 8 authors:
1. Kangsheng Wang
2. Yuhang Li
3. Chengwei Ye
4. Yufei Lin
5. Huanzhen Zhang
6. Bohan Hu
7. Linuo Xu
8. Shuyan Liu
This article has no evaluationsLatest version May 7, 2025

Listed in

Abstract

Article activity feed

Related articles

Multigranular Unified Synthesis Encoder for Fine-grained Multimodal Emotion Understanding

Attention Based Hybrid Deep Learning models for Multi class Amharic News Categorization with Explainable AI

GAME: Learning Multimodal Interactions via Graph Structures for Personality Trait Estimation