On the Development of ToxicBias-Reasoning for Responsible Multicultural Bias Detection and Explanation

Anuj Kumar
Mahendra Kumar Gurve
Satyadev Ahlawat
Yamuna Prasad
Virendra Singh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Bias in language models refers to systematic unfairness toward social groups, making it essential to build datasets that capture such bias in diverse and multicultural contexts. Existing bias detection datasets are limited in cultural diversity, do not capture overlapping categories of bias, and rarely provide support for generating human-interpretable reasoning, which restricts their usefulness for responsible AI development. To address these gaps, we introduce the \textit{ToxicBias-Reasoning} dataset with 7,562 statements (5639 biased, 1,923 non-biased), including a new \textit{Caste} category (247 examples) and additional samples reflecting Indian cultural biases. Our key contributions are threefold: (1) we provide a high-quality dataset in which all classification labels are manually annotated, the reasoning test set is entirely manual, and the reasoning annotations for training and validation are generated through a GPT-4o–assisted human-in-the-loop pipeline, ensuring scalability while maintaining quality; (2) we establish strong baselines using transformer-based models (BERT, RoBERTa) under hierarchical and multitask configurations, where a logic-aware loss function is introduced to capture inter-label dependencies in the multilabel category classification task which improves macro-F1 for category-level prediction; and (3) we benchmark reasoning generation using a BART-Large model distilled from GPT-4o outputs, achieving a ROUGE-L of 45.22. These contributions offer the first comprehensive benchmark for interpretable and culturally inclusive bias detection with reasoning.

Version published to 10.21203/rs.3.rs-7505866/v1 on Research Square
Sep 5, 2025

Revisiting Sampling Bias: Implications on Fairness Measurement and Mitigation

This article has 4 authors:
1. Sami Zhioua
2. Ruta Binkyte
3. Ayoub Ouni
4. Farah Barika Ktata
This article has no evaluationsLatest version Sep 18, 2025
Discursive Behavior of Generative Language Models on Geopolitical and Humanitarian Topics: A Comparative Analysis

This article has 1 author:
1. Marco Giacalone
This article has no evaluationsLatest version Aug 28, 2025
Responsible AI in NLP: GUS-Net Span-Level Bias Detection Dataset and Benchmark for Generalizations, Unfairness, and Stereotypes

This article has 8 authors:
1. Maximus Powers
2. Shaina Raza
3. Alex Chang
4. Rehana Riaz
5. Umang Mavani
6. Harshitha Reddy Jonala
7. Ansh Tiwari
8. Hua Wei
This article has no evaluationsLatest version Oct 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Revisiting Sampling Bias: Implications on Fairness Measurement and Mitigation

Discursive Behavior of Generative Language Models on Geopolitical and Humanitarian Topics: A Comparative Analysis

Responsible AI in NLP: GUS-Net Span-Level Bias Detection Dataset and Benchmark for Generalizations, Unfairness, and Stereotypes