Bengali Hate Speech Detection from Social Media using Ensemble Machine Learning Approach.
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing prevalence of hate speech in Bengali across social media is a growing concern for the government and platform providers. Timely detection andremoval of such content are essential to preventing cyber violence and real-worldconflicts. However, the informal nature of online communication, with variationsin spelling and grammar, makes identification challenging.This study proposes an ensemble-based machine learning model for detectinghate speech in Bengali. A diverse dataset was collected from various onlinesources, followed by comprehensive preprocessing and classification into threetasks: (i) binary classification (Hate Speech vs. Not Hate), (ii) multi-label classification (categorizing different types of hate speech), and (iii) target identification.We explored machine learning algorithms alongside deep learning models and theensemble approach. In our proposed approach, we applied bagging with DecisionTree classifiers to create an ensemble model. Then, we built a stacking ensemblemodel, integrating Random Forest, SVM, Logistic Regression, and the baggingensemble classifiers. It achieved 91.49% accuracy with an F1-score of 91.49% onthe imbalanced dataset, while on the balanced dataset, accuracy improved to94.37% with an F1-score of 94.37%.