Hate Speech Detection in Hindi Using Neural Networks

Afreen Sorathiya
Jinal Mehta
Jay Vithlani
Manha Sorathiya
Mohamed Ayaan Gubitra

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rise of social media platforms has facilitated rapid communication but also led to the widespread dissemination of hate speech, particularly in low-resource languages such as Hindi. This study presents a deep learning-based approach for detecting hate speech in Hindi using a Bidirectional Long Short-Term Memory (BiLSTM) architecture. A dataset of 15,000 annotated posts-sourced from Twitter, newspapers, and televised news-was curated, capturing both formal and informal language, including code-mixed Hindi-English content. To enhance robustness and generalization, the dataset was combined and split into three randomized train-test configurations (10k-5k), with the model trained and evaluated independently on each. Preprocessing steps included tokenization, padding, and label encoding, with text sequences passed through an embedding layer followed by stacked BiLSTM and dense layers. The model achieved consistent accuracy across all splits (72.67%-74.10%), demonstrating its stability under varied data distributions. The findings underscore the linguistic challenges of hate speech detection in Hindi and propose a multi-split evaluation framework as a reliable alternative to single-split benchmarks. This work contributes to the growing body of research on inclusive and context-aware content moderation systems for underrepresented languages, and lays the groundwork for future advancements involving transformer-based models and multi-label classification.

Version published to 10.20944/preprints202508.0402.v1
Aug 6, 2025

Towards Secure Social Platforms: Hate Speech Detection and Classification in Indian Languages Using Hybrid Soft Computing Techniques

This article has 1 author:
1. Purbani Kar
This article has no evaluationsLatest version Jul 25, 2025
Hybrid FastText-LSTM for Fake News Detection: A Multilingual Approach with a Focus on Kurdish and English

This article has 2 authors:
1. Azad Karim
2. Bryar Hassan
This article has no evaluationsLatest version Jul 2, 2025
Somali Dialect Identification: A Low-Resource Benchmark for MAXAA TIRI and MAAY Using Machine and Deep Learning

This article has 5 authors:
1. Abdifatah Ahmed Gedi
2. Yusuf Mohamed Ahmed
3. Shafie Abdi Mohamed
4. Yusuf Ahmed Yusuf
5. Abdénuur Umur Ebdiyow
This article has no evaluationsLatest version Jul 22, 2025

Listed in

Abstract

Article activity feed

Related articles

Towards Secure Social Platforms: Hate Speech Detection and Classification in Indian Languages Using Hybrid Soft Computing Techniques

Hybrid FastText-LSTM for Fake News Detection: A Multilingual Approach with a Focus on Kurdish and English

Somali Dialect Identification: A Low-Resource Benchmark for MAXAA TIRI and MAAY Using Machine and Deep Learning