On the Development of ToxicBias-Reasoning for Responsible Multicultural Bias Detection and Explanation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bias in language models refers to systematic unfairness toward social groups, making it essential to build datasets that capture such bias in diverse and multicultural contexts. Existing bias detection datasets are limited in cultural diversity, do not capture overlapping categories of bias, and rarely provide support for generating human-interpretable reasoning, which restricts their usefulness for responsible AI development. To address these gaps, we introduce the \textit{ToxicBias-Reasoning} dataset with 7,562 statements (5639 biased, 1,923 non-biased), including a new \textit{Caste} category (247 examples) and additional samples reflecting Indian cultural biases. Our key contributions are threefold: (1) we provide a high-quality dataset in which all classification labels are manually annotated, the reasoning test set is entirely manual, and the reasoning annotations for training and validation are generated through a GPT-4o–assisted human-in-the-loop pipeline, ensuring scalability while maintaining quality; (2) we establish strong baselines using transformer-based models (BERT, RoBERTa) under hierarchical and multitask configurations, where a logic-aware loss function is introduced to capture inter-label dependencies in the multilabel category classification task which improves macro-F1 for category-level prediction; and (3) we benchmark reasoning generation using a BART-Large model distilled from GPT-4o outputs, achieving a ROUGE-L of 45.22. These contributions offer the first comprehensive benchmark for interpretable and culturally inclusive bias detection with reasoning.