Benchmarking LLM Fairness: Multi-Agent Evaluators for Scalable Model Assessment

Anil Kumar Jonnalagadda

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

As Large Language Models (LLMs) expand into sensitive applications, concerns about fairness and bias have grown significantly. Traditional evaluation benchmarks capture static performance on curated datasets, but they often fail to measure the nuanced ways bias emerges across different contexts. This paper introduces the concept of multi-agent evaluators—independent LLMs configured to assess each other’s outputs—as a scalable methodology for fairness benchmarking. The framework enables adaptive, context-aware assessments where evaluators detect subtle disparities across demographic groups, task formulations, and linguistic variations. By combining redundancy, diversity, and adversarial prompting, multiagent evaluation offers a promising path toward more reliable fairness auditing. The study also explores how such approaches integrate with governance frameworks, illustrating their potential in domains such as recruitment, healthcare communication, and automated decision support. Ultimately, the findings argue for fairness benchmarking as a continuous process powered by collaborative LLM evaluators, rather than one-time testing on static datasets.

Version published to 10.20944/preprints202512.1084.v1
Dec 11, 2025

Interpretability and Trust in Large Language and Agentic Models: A Survey of Methods, Metrics, and Applications

This article has 1 author:
1. Jithesh Yemi Reddy
This article has no evaluationsLatest version Dec 24, 2025
A Confidence-Adjusted Consensus Mechanism for Scalable Deliberative Decision-Making

This article has 1 author:
1. Tal Yaron
This article has no evaluationsLatest version Jan 17, 2026
Fair and Robust Estimation of Heterogeneous Treatment Effects for Optimal Policies in Multilevel Studies

This article has 4 authors:
1. Youmi Suk
2. Chan Park
3. Chenguang Pan
4. Kwangho Kim
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Interpretability and Trust in Large Language and Agentic Models: A Survey of Methods, Metrics, and Applications

A Confidence-Adjusted Consensus Mechanism for Scalable Deliberative Decision-Making

Fair and Robust Estimation of Heterogeneous Treatment Effects for Optimal Policies in Multilevel Studies