Research on Multimodal Hate Speech Detection Based on Self-Attention Mechanism Feature Fusion

Junjie Mao
Hanxiao Shi
Xiaojun Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The widespread rise of multimedia social platforms has diversified the ways in which people communicate and the content they share. Hate speech, as a threat to societal harmony, has also shifted its manifestation from a singular textual to a multimodal one. Previously, most methods for detecting hate speech were limited to the text modality, making it difficult to identify and classify newly emerging multimodal hate speech that combines text and images. This paper proposes a novel multi-modal hate speech detection model to respond to the above-mentioned needs for multi-modal hate speech detection. The proposed joint model can use moving windows to extract multi-level visual features and extract text features based on the RoBERTa pre-training model and introduces a multi-head self-attention mechanism in the later fusion process for image and text feature fusion. This article also conducted experiments on the multi-modal benchmark data set Hateful Memes. The model achieved an accuracy of 0.8780, precision of 0.9135, F1-Score of 0.8237, and AUCROC of 0.8532, defeating the SOTA multi-modal hate speech recognition model.

Version published to 10.21203/rs.3.rs-4836799/v1 on Research Square
Sep 20, 2024

DDCAF: Dynamic Dual Cross-Attention Fusion Framework for Multimodal Hate Speech Detection

This article has 3 authors:
1. Gauri Kitukale
2. Navneet Pratap Singh
3. Sidharth Quamara
This article has no evaluationsLatest version Sep 15, 2025
DCA-CL: Enhancing Multimodal Emotion Recognition via Dual Cross Attention and Contrastive Learning

This article has 5 authors:
1. Xin Wang
2. Shubo Liu
3. Hongshe Dang
4. Longlong Qiao
5. Hongnian Yu
This article has no evaluationsLatest version Aug 27, 2025
Feature Significance in Speech Emotion Recognition

This article has 2 authors:
1. Atul Mishra
2. Sarthak Jindal
This article has no evaluationsLatest version Aug 28, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DDCAF: Dynamic Dual Cross-Attention Fusion Framework for Multimodal Hate Speech Detection

DCA-CL: Enhancing Multimodal Emotion Recognition via Dual Cross Attention and Contrastive Learning

Feature Significance in Speech Emotion Recognition