An Information-Theoretic Model of Abduction for Detecting Hallucinations in Explanations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present An Information-Theoretic Model of Abduction for Detecting Hallucinations in Generative Models, a neuro-symbolic framework that combines entropy-based inference with abductive reasoning to identify unsupported or contradictory content in large language model outputs. Our approach treats hallucination detection as a dual optimization problem: minimizing the information gain between source-conditioned and response-conditioned belief distributions, while simultaneously selecting the minimal abductive hypothesis capable of explaining discourse-salient claims. By incorporating discourse structure through RST-derived EDU weighting, the model distinguishes legitimate abductive elaborations from claims that cannot be justified under any computationally plausible hypothesis. Experimental evaluation across medical, factual QA, and multi-hop reasoning datasets demonstrates that the proposed method outperforms state-of-the-art neural and symbolic baselines in both accuracy and interpretability. Qualitative analysis further shows that the framework successfully exposes plausible-sounding but abductively unsupported model errors, including real hallucinations generated by GPT-5.1. Together, these results indicate that integrating information-theoretic divergence and abductive explanation provides a principled and effective foundation for robust hallucination detection in generative systems.