A Formal Framework for Evaluating Reasoning Integrity in Language Models

Amaya Kavya
Shardul Shinde

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Traditional evaluation of language models prioritizes Ñnal-answer accuracy, offering limited insight into the reasoning processes that produce those outputs. Thispaper introduces a formal framework for evaluating reasoning integrity by modelinginference as a trajectory of belief states under uncertainty. We deÑne externallyobservable belief states that capture hypotheses, uncertainty distributions, and con-straints at each reasoning step, enabling analysis without reliance on internal modelrepresentations. Building on this formulation, we propose a divergence functional that quantiÑessustained disagreement between reasoning trajectories, together with a complexityregularization term that penalizes excessive or redundant reasoning. These compo-nents are combined into a uniÑed scoring function that balances consistency andparsimony. To operationalize the framework, we introduce a multi-stage evalua-tion protocol that constrains intermediate reasoning, injects minimal adversarialperturbations, and measures both divergence and repair cost. We establish theoretical properties of the proposed metrics, including bound-edness, invariance under semantic-preserving transformations, and stability undercontrolled perturbations. Analytical examples illustrate how the framework distin-guishes robust reasoning processes from brittle or superÑcial ones that maintaincorrectness without internal consistency. By shifting evaluation from outcomes tothe dynamics of reasoning, this framework provides a principled basis for assessingreliability and stability in modern language models.

Version published to 10.20944/preprints202603.2034.v1
Mar 26, 2026

CausalReasonBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models

This article has 1 author:
1. Ahmed Cherif
This article has no evaluationsLatest version Apr 13, 2026
A Novelty–Plausibility Recursive Framework for Creative Reasoning in Epistemic Idea-Spaces

This article has 1 author:
1. Kalin Stoyanov
This article has no evaluationsLatest version Mar 11, 2026
Toward a Possibilistic Language Model: Grounding Large Language Model Uncertainty in Epistemic Support-Point Theory and Possibility Theory

This article has 1 author:
1. Moriba Kemessia Jah
This article has no evaluationsLatest version Apr 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

CausalReasonBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models

A Novelty–Plausibility Recursive Framework for Creative Reasoning in Epistemic Idea-Spaces

Toward a Possibilistic Language Model: Grounding Large Language Model Uncertainty in Epistemic Support-Point Theory and Possibility Theory