How Can Hallucinatory Biases Be Effectively Audited and Mitigated in Vision-Language Models? A Unified Theoretical and Empirical Framework Across GPT-4o, Grok 3, and Claude Sonnet 4.5
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Vision-Language Models (VLMs) represent a transformative class of multimodal artificial intelligence systems that integrate deep visual perception with large-scale language generation, enabling sophisticated reasoning across image captioning, visual question answering, and embodied AI applications. Despite extraordinary benchmark performance, these models suffer from a well-documented and conse-quential failure mode: hallucinatory bias, wherein the model generates textual content that is unsupported by, or outright contradicts, the visual evidence presented at inference time. This paper delivers a comprehensive theoretical and empirical framework for the principled auditing and systematic mitigation of such biases, evaluated simultaneously across three frontier VLMs: OpenAI’s GPT-4o, xAI’s Grok 3, and Anthropic’s Claude Sonnet 4.5. We introduce Multimodal Semantic Entropy (H ms) as a novel information-theoretic auditing metric that jointly captures linguistic and visual uncertainty. We provide rigorous convergence proofs showing that H ms converges almost surely to the true hallucination indicator at rate O(1/min(N, M)). For mitigation, we propose Adaptive Contrastive Logit Subtraction (ACLS), a principled inference-time algorithm with a complete proof of KL-divergence minimization at geometric rate (1−αˆP1−αˆ 1−αˆP hall) 2. We additionally prove a sharp geometric convergence theorem and an information-theoretic detection lower bound that characterizes the fundamental statistical difficulty of hallucination detection. Eight TikZ-generated figures—embedded throughout the paper body—illustrate the hallucination taxonomy, auditing pipeline, uncertainty profiles, mitigation architecture , convergence trajectories, and calibration curves. Experimental evaluations on MS-COCO, NoCaps, and MMBench demonstrate hallucination rate reductions of 37–43% post-ACLS, alongside up to 44% reduction in the Bias Disparity Index (BDI) for demographic fairness. We additionally derive the Bias Disparity Index with confidence interval bounds and formalize the mitigation-coverage tradeoff as an optimization problem. Ethical guidelines for responsible deployment conclude the framework.