A Formal Framework for Evaluating Reasoning Integrity in Language Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Traditional evaluation of language models prioritizes Ñnal-answer accuracy, offering limited insight into the reasoning processes that produce those outputs. Thispaper introduces a formal framework for evaluating reasoning integrity by modelinginference as a trajectory of belief states under uncertainty. We deÑne externallyobservable belief states that capture hypotheses, uncertainty distributions, and con-straints at each reasoning step, enabling analysis without reliance on internal modelrepresentations. Building on this formulation, we propose a divergence functional that quantiÑessustained disagreement between reasoning trajectories, together with a complexityregularization term that penalizes excessive or redundant reasoning. These compo-nents are combined into a uniÑed scoring function that balances consistency andparsimony. To operationalize the framework, we introduce a multi-stage evalua-tion protocol that constrains intermediate reasoning, injects minimal adversarialperturbations, and measures both divergence and repair cost. We establish theoretical properties of the proposed metrics, including bound-edness, invariance under semantic-preserving transformations, and stability undercontrolled perturbations. Analytical examples illustrate how the framework distin-guishes robust reasoning processes from brittle or superÑcial ones that maintaincorrectness without internal consistency. By shifting evaluation from outcomes tothe dynamics of reasoning, this framework provides a principled basis for assessingreliability and stability in modern language models.

Article activity feed