How Can Hallucinatory Biases Be Effectively Audited and Mitigated in Vision-Language Models? A Unified Theoretical and Empirical Framework Across GPT-4o, Grok 3, and Claude Sonnet 4.5

Amirali Ghajari

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Vision-Language Models (VLMs) represent a transformative class of multimodal artificial intelligence systems that integrate deep visual perception with large-scale language generation, enabling sophisticated reasoning across image captioning, visual question answering, and embodied AI applications. Despite extraordinary benchmark performance, these models suffer from a well-documented and conse-quential failure mode: hallucinatory bias, wherein the model generates textual content that is unsupported by, or outright contradicts, the visual evidence presented at inference time. This paper delivers a comprehensive theoretical and empirical framework for the principled auditing and systematic mitigation of such biases, evaluated simultaneously across three frontier VLMs: OpenAI’s GPT-4o, xAI’s Grok 3, and Anthropic’s Claude Sonnet 4.5. We introduce Multimodal Semantic Entropy (H ms) as a novel information-theoretic auditing metric that jointly captures linguistic and visual uncertainty. We provide rigorous convergence proofs showing that H ms converges almost surely to the true hallucination indicator at rate O(1/min(N, M)). For mitigation, we propose Adaptive Contrastive Logit Subtraction (ACLS), a principled inference-time algorithm with a complete proof of KL-divergence minimization at geometric rate (1−αˆP1−αˆ 1−αˆP hall) 2. We additionally prove a sharp geometric convergence theorem and an information-theoretic detection lower bound that characterizes the fundamental statistical difficulty of hallucination detection. Eight TikZ-generated figures—embedded throughout the paper body—illustrate the hallucination taxonomy, auditing pipeline, uncertainty profiles, mitigation architecture , convergence trajectories, and calibration curves. Experimental evaluations on MS-COCO, NoCaps, and MMBench demonstrate hallucination rate reductions of 37–43% post-ACLS, alongside up to 44% reduction in the Bias Disparity Index (BDI) for demographic fairness. We additionally derive the Bias Disparity Index with confidence interval bounds and formalize the mitigation-coverage tradeoff as an optimization problem. Ethical guidelines for responsible deployment conclude the framework.

Version published to 10.21203/rs.3.rs-8926006/v1 on Research Square
Apr 8, 2026

A Multi-agent Court to Mitigate VLM Hallucinations

This article has 4 authors:
1. Percy Lam
2. Lavindra de Silva
3. Weiwei Chen
4. Ioannis Brilakis
This article has no evaluationsLatest version Mar 31, 2026
Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

This article has 3 authors:
1. Rashid Mehmood
2. Eid Rehman
3. Muhammad Habib
This article has no evaluationsLatest version Apr 1, 2026
DMES: Information-Equivalent Evaluation Reveals the Physical Reasoning Gap Between World Models and Language Models

This article has 1 author:
1. Liutao Hu
This article has no evaluationsLatest version Apr 7, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Multi-agent Court to Mitigate VLM Hallucinations

Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

DMES: Information-Equivalent Evaluation Reveals the Physical Reasoning Gap Between World Models and Language Models