Limits of Self-Correction in LLMs: An Information-Theoretic Analysis of Correlated Errors

Andrew Michael Brilliant

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We develop a diagnostic framework for evaluating when LLM self-evaluation can be trusted. The framework's central results are: (1) under a shared-blind-spot modeling assumption with joint conditional independence of evaluations given the shared failure structure, k rounds of self-critique provide information about correctness bounded by what the shared latent failure variable Z mediates---not by any independent channel---so that confidence accumulated through repeated self-evaluation reflects the shared failure structure rather than independently accumulated evidence; and (2) a selector satisfying two independently measurable sufficient conditions---bounded false-acceptance and true-acceptance exceeding that bound---provides a quantifiable lower bound on evidence about correctness. Both results are conditional on explicit modeling assumptions. We also prove an information-theoretic bound showing that self-evaluation is bounded in what it can add when a shared latent failure structure mediates both generation and evaluation errors; we foreground this as scaffolding rather than a primary contribution, since the latent variable requires independent operationalization to give the bound empirical bite.The diagnostic framework identifies what to measure to determine whether a deployed system is in the failure regime, and what properties an external selector must have to escape it. We describe design principles for an architecture motivated by this analysis; same-model context separation is an engineering heuristic, not a theoretical solution, and we present it as a practical starting point pending empirical validation.

Version published to 10.20944/preprints202601.0892.v4
Apr 9, 2026
Version published to 10.20944/preprints202601.0892.v3
Feb 27, 2026
Version published to 10.20944/preprints202601.0892.v2
Feb 11, 2026
Version published to 10.20944/preprints202601.0892.v1
Jan 13, 2026

Limits of Self-Correction in LLMs: An Information-Theoretic Analysis of Correlated Errors

This article has 1 author:
1. Andrew Michael Brilliant
This article has no evaluationsLatest version Apr 9, 2026
A Proof of Concept for the Measurement of Reliability as Conditional Determinacy

This article has 1 author:
1. Domenic Groh
This article has no evaluationsLatest version Apr 19, 2026
A Quantum-Inspired Framework for Real-Time Trust Estimation in Human-AI Interaction: A Pilot Study

This article has 4 authors:
1. Reyansh Badhwar
2. Zehaan Walji
3. Parshva Dave
4. Junho Park
This article has no evaluationsLatest version Mar 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Limits of Self-Correction in LLMs: An Information-Theoretic Analysis of Correlated Errors

A Proof of Concept for the Measurement of Reliability as Conditional Determinacy

A Quantum-Inspired Framework for Real-Time Trust Estimation in Human-AI Interaction: A Pilot Study