Knowing Before Speaking: In-Computation Metacognition Precedes Verbal Confidence in Large Language Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We propose the Knowledge Landscape hypothesis: the forward pass of a large language model (LLM) encodes whether it knows the answer before producing any output token. Well-learned knowledge corresponds to deep convergence valleys in the activation landscape; unlearned queries traverse flat plains where signals disperse. These geometric properties manifest as measurable signals—token-level entropy and layer-wise hidden-state variance— that precede and causally influence the model’s output uncertainty. On TriviaQA with Qwen2.5-7B and Mistral-7B, token entropy strongly discriminates known from unknown questions (Mann-Whitney p < 10⁻⁷, rank-biserial r > 0.5 across both architectures). Hidden-state variance localises a metacognitive locus at layers 9 and 20–27 (peak p < 10⁻⁴, r = 0.46). Activation patching with monotone interpolation provides causal confirmation: entropy decreases strictly as the known hidden state is progressively substituted, with Spearman rank correlation of negative one (permutation p < 0.001). A single-pass abstention system built on these signals achieves an area under the ROC curve of 0.804 and a 5.6 percentage-point accuracy gain over the unaided baseline, without any fine-tuning

Article activity feed