Knowing Before Speaking: In-Computation Metacognition Precedes Verbal Confidence in Large Language Models

Jaehwan Kim

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We propose the Knowledge Landscape hypothesis: a large language model’s forward pass encodes whether it knows the answer before producing any output token. Well-learned knowledge traverses deep convergence valleys in the activation landscape; unlearned queries traverse flat plains where signals disperse. These geometric properties manifest as two probe-free, single-pass signals—token-level entropy and layer-wise hidden-state variance— that precede and causally influence output uncertainty. Across two architecturally distinct models (Qwen2.5-7B and Mistral-7B) on TriviaQA, token entropy strongly discriminates known from unknown questions with large effect sizes, replicated at 300 samples per condition with a 95% bootstrap confidence interval entirely above 0.64. Hidden-state variance further localises a metacognitive locus in both architectures, consistently at 61–69% of total network depth, suggesting this is a universal structural property of transformer LLMs. Activation patching confirms causality: injecting a known- question hidden state into an unknown-question forward pass monotonically reduces output entropy. A lightweight abstention system built on these signals achieves a ROC-AUC of 0.804 and a 5.6 percentage-point accuracy gain over the unaided baseline, without any fine-tuning or additional training data.

Version published to 10.20944/preprints202604.0078.v2
Apr 3, 2026
Version published to 10.20944/preprints202604.0078.v1
Apr 1, 2026

Knowing Before Speaking: In-Computation Metacognition Precedes Verbal Confidence in Large Language Models

This article has 1 author:
1. Jaehwan Kim
This article has no evaluationsLatest version Apr 3, 2026
Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

This article has 3 authors:
1. Rashid Mehmood
2. Eid Rehman
3. Muhammad Habib
This article has no evaluationsLatest version Apr 1, 2026
DMES: Information-Equivalent Evaluation Reveals the Physical Reasoning Gap Between World Models and Language Models

This article has 1 author:
1. Liutao Hu
This article has no evaluationsLatest version Apr 7, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Knowing Before Speaking: In-Computation Metacognition Precedes Verbal Confidence in Large Language Models

Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

DMES: Information-Equivalent Evaluation Reveals the Physical Reasoning Gap Between World Models and Language Models