Knowing Before Speaking: In-Computation Metacognition Precedes Verbal Confidence in Large Language Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We propose the Knowledge Landscape hypothesis: a large language model’s forward pass encodes whether it knows the answer before producing any output token. Well-learned knowledge traverses deep convergence valleys in the activation landscape; unlearned queries traverse flat plains where signals disperse. These geometric properties manifest as two probe-free, single-pass signals—token-level entropy and layer-wise hidden-state variance— that precede and causally influence output uncertainty. Across two architecturally distinct models (Qwen2.5-7B and Mistral-7B) on TriviaQA, token entropy strongly discriminates known from unknown questions with large effect sizes, replicated at 300 samples per condition with a 95% bootstrap confidence interval entirely above 0.64. Hidden-state variance further localises a metacognitive locus in both architectures, consistently at 61–69% of total network depth, suggesting this is a universal structural property of transformer LLMs. Activation patching confirms causality: injecting a known- question hidden state into an unknown-question forward pass monotonically reduces output entropy. A lightweight abstention system built on these signals achieves a ROC-AUC of 0.804 and a 5.6 percentage-point accuracy gain over the unaided baseline, without any fine-tuning or additional training data.

Article activity feed