An Information-Theoretic Model of Abduction for Detecting Hallucinations in Explanations

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present An Information-Theoretic Model of Abduction for Detecting Hallucinations in Generative Models, a neuro-symbolic framework that combines entropy-based inference with abductive reasoning to identify unsupported or contradictory content in large language model outputs. Our approach treats hallucination detection as a dual optimization problem: minimizing the information gain between source-conditioned and response-conditioned belief distributions, while simultaneously selecting the minimal abductive hypothesis capable of explaining discourse-salient claims. By incorporating discourse structure through RST-derived EDU weighting, the model distinguishes legitimate abductive elaborations from claims that cannot be justified under any computationally plausible hypothesis. Experimental evaluation across medical, factual QA, and multi-hop reasoning datasets demonstrates that the proposed method outperforms state-of-the-art neural and symbolic baselines in both accuracy and interpretability. Qualitative analysis further shows that the framework successfully exposes plausible-sounding but abductively unsupported model errors, including real hallucinations generated by GPT-5.1. Together, these results indicate that integrating information-theoretic divergence and abductive explanation provides a principled and effective foundation for robust hallucination detection in generative systems.

Article activity feed