Granularity-Guided Fusion for Multi-Modal Sentiment Understanding

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multimodal sarcasm detection involves identifying sarcasm through multiple modalities of information, with the key challenge lying in modeling incongruity within and between modalities. Current methods often focus on intermodal incongruity while neglecting the potential of fully exploring semantic information within each modality. To address this, we propose the Granularity-Based Inter and Intra-Modal Fusion Network (GIIFN). This approach integrates handcrafted image descriptors with deep learning models to extract comprehensive semantic information from images and leverages a pre-trained language model to enhance image analysis with large-scale textual knowledge. Moreover, our feature interaction model effectively fuses features at different granularities, capturing fine details and contextual information. Extensive experiments demonstrate that our method outperforms existing approaches and achieves state-of-the-art results in multimodal sarcasm detection.

Article activity feed