A Dynamic Weighted Fusion Model for Multimodal Sentiment Analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multimodal sentiment analysis (MSA) tasks leverage diverse data sources, including text, audio, and visual data, to infer users' sentiment states. Previous research has mainly focused on capturing the differences and consistency of sentiment information between different modalities, emphasizing cross-modal interaction, while neglecting the in-depth exploration of sentiment information within individual modalities. Additionally, existing MSA methods rarely examine the contribution of each modality to model performance. To address these issues, our paper proposes a dynamic weighted fusion model for multimodal sentiment analysis. Specifically, we first design a multi-level semantic enhancement module (MLSE) for each single mode, which replicates three copies of each mode, captures local and global emotional information using convolutional neural networks and attention mechanisms, and aims to extract semantic information from multiple perspectives and levels in a single mode. Subsequently, we design a genetic algorithm module suitable for multimodal sentiment analysis tasks, which dynamically calculates the optimal weight of each modality during model training and selects the modality with the maximum weight as the primary modality, while the other two are considered as auxiliary modalities. We conduct extensive experiments on three benchmark datasets (CMU-MOSI, CMU-MOSEI and CH-SIMS), and the results demonstrate that our proposed model outperforms state-of-the-art models across various metrics.

Article activity feed