A Geometric Taxonomy of Hallucinations in LLMs
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The term “hallucination” converge different failure modes with specific geometric signatures in embedding space. We propose a taxonomy identifying three types: unfaithfulness (Type I: ignoring provided context), confabulation (Type II: inventing semantically foreign content), and factual error (Type III: wrong details within correct conceptual frames). We introduce two detection methods grounded in this taxonomy: the Semantic Grounding Index (SGI) for Type I, which measures whether a response moves toward provided context on the unit hypersphere, and the Directional Grounding Index (\(\Gamma\)) for Type II, which measures displacement geometry in context-free settings. \(\Gamma\) achieves AUROC \(0.958 \pm 0.034\) on human-crafted confabulations with 3.8% cross-domain degradation. External validation on three independently collected human-annotated benchmarks—WikiBio GPT-3 [1], FELM [2], and ExpertQA [3]—yields domain-specific AUROC 0.581–0.695, with \(\Gamma\) outperforming an NLI CrossEncoder baseline by \(\Delta = 0.243\) on expert-domain data where surface entailment operates at chance. On LLM-generated benchmarks, detection is domain-local. We examine the Type III boundary through TruthfulQA [4], where apparent classifier signal (LR AUROC 0.731) is traced to a stylistic annotation confound: false answers are geometrically closer to queries than truthful ones, a pattern incompatible with factual-error detection. This identifies a theoretical constraint from a methodological limitation.