A Multi-agent Court to Mitigate VLM Hallucinations

Percy Lam
Lavindra de Silva
Weiwei Chen
Ioannis Brilakis

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Hallucinations have hindered the widespread use of vision language models (VLMs) for domain-specific applications such as road maintenance. While previous researchers constructed multiple solutions for different sources of visual hallucinations, knowledge gaps persist in handling context-dependent hallucinations where the targeted objects are difficult to be prompted precisely. This research explores hallucination by converting image-to-text binary classifications into evidential arguments by VLM agents, each providing a binary Yes/No answer with a justification. The proposed solution involves VLM agents performing distinct roles, starting with a detection unit that uses a primary detector and a reviewer to verify scope compatibility. These agents interact to aggregate their findings and justifications into a single, unified verdict. The different roles of each agent are inspired by the distinctive roles of the prosecutor, the defence counsel and the judge, while the questioning techniques used by the justification reviewer are inspired by lawyers' examination techniques in court room and argumentation schemes. Experiments are performed on toppled poles from road scene images in the Urban Issue dataset, and wider general adoption through subsets of the PhD dataset annotated on COCO2014. Experiments show that our solution achieved a superior overturning rate of 30\% and 2.3 percentage points increase in F1 score in the domain-specific application, with 50\% less time required than the closest multi-agent solution. Comparable detection performance and efficient resource consumption were also seen in the general adoption.

Version published to 10.21203/rs.3.rs-8949950/v1 on Research Square
Mar 31, 2026

How Can Hallucinatory Biases Be Effectively Audited and Mitigated in Vision-Language Models? A Unified Theoretical and Empirical Framework Across GPT-4o, Grok 3, and Claude Sonnet 4.5

This article has 1 author:
1. Amirali Ghajari
This article has no evaluationsLatest version Apr 8, 2026
Rethinking Medical LLM Hallucinations: A System-Level Survey

This article has 4 authors:
1. Asha Matthews
2. Vijay Vankadaru
3. Tanya Roosta
4. Peyman Passban
This article has no evaluationsLatest version Mar 23, 2026
The Scoring Problem in Multi-Model LLM Benchmarks: How Unreported Methodological Choices Change Hallucination Measurement by 3.5×

This article has 2 authors:
1. AZRIL BIN HAMZAH
2. SHASHA TENG
This article has no evaluationsLatest version Mar 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

How Can Hallucinatory Biases Be Effectively Audited and Mitigated in Vision-Language Models? A Unified Theoretical and Empirical Framework Across GPT-4o, Grok 3, and Claude Sonnet 4.5

Rethinking Medical LLM Hallucinations: A System-Level Survey

The Scoring Problem in Multi-Model LLM Benchmarks: How Unreported Methodological Choices Change Hallucination Measurement by 3.5×