A Multi-Modal Sarcasm Detection Model Based onCue Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid proliferation of internet data, particularly through social media, has amplified the need for effective sentiment analysis, including the complex task of sarcasm detection. This paper presents a novel multi-modal sarcasm detection model leveraging cue learning techniques to address the challenges posed by data scarcity, especially in low-resource languages. The proposed model builds upon the CLIP architecture, integrating text and image modalities to co-learn sarcasm cues. The methodology encompasses discrete prompt generation, learnable continuous vectors, and multi-modal fusion to enhance detection accuracy. The multi-modal fusion process demonstrates a symmetric integration of text and image data, leading to improved performance. Experimental results on the Twitter Multi-modal Sarcasm Detection Dataset (MSD) demonstrate significant performance improvements over traditional models, highlighting the model's robustness and adaptability in small-sample scenarios. This research contributes a practical solution for nuanced sentiment analysis, paving the way for advanced applications in public opinion monitoring and AI-driven decision-making processes.

Article activity feed