Knowing Before Speaking: In-Computation Metacognition Precedes Verbal Confidence in Large Language Models

Jaehwan Kim

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We propose the Knowledge Landscape hypothesis: the forward pass of a large language model (LLM) encodes whether it knows the answer before producing any output token. Well-learned knowledge corresponds to deep convergence valleys in the activation landscape; unlearned queries traverse flat plains where signals disperse. These geometric properties manifest as measurable signals—token-level entropy and layer-wise hidden-state variance— that precede and causally influence the model’s output uncertainty. On TriviaQA with Qwen2.5-7B and Mistral-7B, token entropy strongly discriminates known from unknown questions (Mann-Whitney p < 10⁻⁷, rank-biserial r > 0.5 across both architectures). Hidden-state variance localises a metacognitive locus at layers 9 and 20–27 (peak p < 10⁻⁴, r = 0.46). Activation patching with monotone interpolation provides causal confirmation: entropy decreases strictly as the known hidden state is progressively substituted, with Spearman rank correlation of negative one (permutation p < 0.001). A single-pass abstention system built on these signals achieves an area under the ROC curve of 0.804 and a 5.6 percentage-point accuracy gain over the unaided baseline, without any fine-tuning

Version published to 10.20944/preprints202604.0078.v1
Apr 1, 2026

Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

This article has 3 authors:
1. Rashid Mehmood
2. Eid Rehman
3. Muhammad Habib
This article has no evaluationsLatest version Apr 1, 2026
Heuristic Stacks: Information Competition in AI-Mediated Knowledge Systems

This article has 1 author:
1. Michael Simeone
This article has no evaluationsLatest version Apr 2, 2026
Bidirectional Dissociation Between Self-Report and Behavior in AI Status Sensitivity

This article has 1 author:
1. Dustin James
This article has no evaluationsLatest version Mar 26, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

Heuristic Stacks: Information Competition in AI-Mediated Knowledge Systems

Bidirectional Dissociation Between Self-Report and Behavior in AI Status Sensitivity