Hate Speech Detection in Hindi Using Neural Networks

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rise of social media platforms has facilitated rapid communication but also led to the widespread dissemination of hate speech, particularly in low-resource languages such as Hindi. This study presents a deep learning-based approach for detecting hate speech in Hindi using a Bidirectional Long Short-Term Memory (BiLSTM) architecture. A dataset of 15,000 annotated posts-sourced from Twitter, newspapers, and televised news-was curated, capturing both formal and informal language, including code-mixed Hindi-English content. To enhance robustness and generalization, the dataset was combined and split into three randomized train-test configurations (10k-5k), with the model trained and evaluated independently on each. Preprocessing steps included tokenization, padding, and label encoding, with text sequences passed through an embedding layer followed by stacked BiLSTM and dense layers. The model achieved consistent accuracy across all splits (72.67%-74.10%), demonstrating its stability under varied data distributions. The findings underscore the linguistic challenges of hate speech detection in Hindi and propose a multi-split evaluation framework as a reliable alternative to single-split benchmarks. This work contributes to the growing body of research on inclusive and context-aware content moderation systems for underrepresented languages, and lays the groundwork for future advancements involving transformer-based models and multi-label classification.

Article activity feed