An Investigation of Bias in Bangla Text Classification Models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid growth of natural language processing (NLP) applications has highlighted concerns about fairness and bias in text classification models. Despite significant advancements, the evaluation of bias and fairness in Bangla text classification remains underexplored. This study investigates model bias in Bangla text classification models, focusing on key fairness metrics such as Demographic Parity, Equalized Odds, and Accuracy Parity. We analyze the performance of widely used models, including Naive Bayes (NB), Support Vector Machine (SVM), Random Forest (RF), LSTM and Bangla-BERT, on a comprehensive dataset. The results reveal disparities in fairness across models, with Bangla-BERT achieving the highest fairness scores but still exhibiting measurable bias. To address this, we conduct an error analysis, highlighting the prevalence of bias-induced misclassifications across sensitive attributes. Additionally, we propose actionable recommendations to enhance fairness in Bangla NLP models, bridging gaps in ethical AI for low-resource languages. Our findings provide valuable insights for developing more equitable Bangla text classification systems and emphasize the need for fairness-aware methodologies in future NLP research.

Article activity feed