Quantifying Claim Robustness Through Adversarial Framing: A Conceptual Framework for an AI-Enabled Diagnostic Tool

Christophe Faugere

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objectives: We introduce the conceptual framework for the Adversarial Claim Robustness Diagnostics (ACRD) protocol, a novel tool for assessing how factual claims withstand ideological distortion. Methods: Based on semantics, adversarial collaboration, and the devil’s advocate approach, we develop a three-phase evaluation process combining baseline evaluations, adversarial speaker reframing, and dynamic AI calibration along with quantified robustness scoring. We introduce the Claim Robustness Index that constitutes our final validity scoring measure. Results: We model the evaluation of claims by ideologically opposed groups as a strategic game with a Bayesian-Nash equilibrium to infer the normative behavior of evaluators after the reframing phase. The ACRD addresses shortcomings in traditional fact-checking approaches and employs large language models to simulate counterfactual attributions while mitigating potential biases. Conclusions: The framework’s ability to identify boundary conditions of persuasive validity across polarized groups can be tested across important societal and political debates ranging from climate change issues to trade policy discourses.

Version published to 10.3390/ai6070147
Jul 7, 2025
Version published to 10.20944/preprints202505.0018.v1
May 2, 2025

Harm-Conditioned Computational Friction as a Diagnostic of Alignment Robustness: A Critical Review and Evaluation Framework

This article has 1 author:
1. regio marcos pinto abreu filho
This article has no evaluationsLatest version Dec 11, 2025
Benchmarking LLM Fairness: Multi-Agent Evaluators for Scalable Model Assessment

This article has 1 author:
1. Anil Kumar Jonnalagadda
This article has no evaluationsLatest version Dec 11, 2025
A Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation

This article has 3 authors:
1. Abrar Alotaibi
2. Raed Mughus
3. Moataz Ahmed
This article has no evaluationsLatest version Dec 18, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Harm-Conditioned Computational Friction as a Diagnostic of Alignment Robustness: A Critical Review and Evaluation Framework

Benchmarking LLM Fairness: Multi-Agent Evaluators for Scalable Model Assessment

A Red Teaming Framework for Large Language Models: A Case Study on Faithfulness Evaluation