How Can Hallucinatory Biases Be Effectively Audited and Mitigated in Vision-Language Models? A Unified Theoretical and Empirical Framework Across GPT-4o, Grok 3, and Claude Sonnet 4.5

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Vision-Language Models (VLMs) represent a transformative class of multimodal artificial intelligence systems that integrate deep visual perception with large-scale language generation, enabling sophisticated reasoning across image captioning, visual question answering, and embodied AI applications. Despite extraordinary benchmark performance, these models suffer from a well-documented and conse-quential failure mode: hallucinatory bias, wherein the model generates textual content that is unsupported by, or outright contradicts, the visual evidence presented at inference time. This paper delivers a comprehensive theoretical and empirical framework for the principled auditing and systematic mitigation of such biases, evaluated simultaneously across three frontier VLMs: OpenAI’s GPT-4o, xAI’s Grok 3, and Anthropic’s Claude Sonnet 4.5. We introduce Multimodal Semantic Entropy (H ms) as a novel information-theoretic auditing metric that jointly captures linguistic and visual uncertainty. We provide rigorous convergence proofs showing that H ms converges almost surely to the true hallucination indicator at rate O(1/min(N, M)). For mitigation, we propose Adaptive Contrastive Logit Subtraction (ACLS), a principled inference-time algorithm with a complete proof of KL-divergence minimization at geometric rate (1−αˆP1−αˆ 1−αˆP hall) 2. We additionally prove a sharp geometric convergence theorem and an information-theoretic detection lower bound that characterizes the fundamental statistical difficulty of hallucination detection. Eight TikZ-generated figures—embedded throughout the paper body—illustrate the hallucination taxonomy, auditing pipeline, uncertainty profiles, mitigation architecture , convergence trajectories, and calibration curves. Experimental evaluations on MS-COCO, NoCaps, and MMBench demonstrate hallucination rate reductions of 37–43% post-ACLS, alongside up to 44% reduction in the Bias Disparity Index (BDI) for demographic fairness. We additionally derive the Bias Disparity Index with confidence interval bounds and formalize the mitigation-coverage tradeoff as an optimization problem. Ethical guidelines for responsible deployment conclude the framework.

Article activity feed