Responsible AI in NLP: GUS-Net Span-Level Bias Detection Dataset and Benchmark for Generalizations, Unfairness, and Stereotypes

Maximus Powers
Shaina Raza
Alex Chang
Rehana Riaz
Umang Mavani
Harshitha Reddy Jonala
Ansh Tiwari
Hua Wei

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Representational harms in language technologies often occur in short spans within otherwise neutral text, where phrases may simultaneously convey generalizations, unfairness, or stereotypes. Framing bias detection as sentence-level classification obscures which words carry bias and what type is present, limiting both auditability and targeted mitigation. We introduce the GUS-Net Framework, comprising the GUS dataset and a multi-label token-level detector for span-level analysis of social bias. The GUS dataset contains 3,739 unique snippets across multiple domains, with over 69,000 token-level annotations. Each token is labeled using BIO tags (Begin, Inside, Outside) for three pathways of representational harm: Generalizations, Unfairness, and Stereotypes. To ensure reliable data annotation, we employ an automated multi-agent pipeline that proposes candidate spans which are subsequently verified and corrected by human experts. We formulate bias detection as multi-label token-level classification and benchmark both encoder-based models (e.g., BERT family variants) and decoder-based large language models (LLMs). Our evaluations cover token-level identification and span-level entity recognition on our test set, and out-of-distribution generalization. Empirical results show that encoder-based models consistently outperform decoder-based baselines on nuanced and overlapping spans while being more computationally efficient. The framework delivers interpretable, fine-grained diagnostics that enable systematic auditing and mitigation of representational harms in real-world NLP systems.

Version published to 10.21203/rs.3.rs-7623811/v1 on Research Square
Oct 15, 2025

CCF Database: A Machine-Learning-Annotated Corpus of 266,271 Canadian Climate Articles (1978–2024)

This article has 3 authors:
1. Antoine Claude Lemor
2. Alizée Pillod
3. Matthew Taylor
This article has no evaluationsLatest version Jan 27, 2026
Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media

This article has 2 authors:
1. Shereen Fouad
2. Ezzaldin Alkooheji
This article has no evaluationsLatest version Jan 12, 2026
LawLLM-DS: A Two-Stage LoRA Framework for Multi-Label Legal Judgment Prediction with Structured Label Dependencies

This article has 3 authors:
1. Pengcheng Zhao
2. Chengcheng Han
3. Kun Han
This article has no evaluationsLatest version Jan 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

CCF Database: A Machine-Learning-Annotated Corpus of 266,271 Canadian Climate Articles (1978–2024)

Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media

LawLLM-DS: A Two-Stage LoRA Framework for Multi-Label Legal Judgment Prediction with Structured Label Dependencies