Architectural Hallucination Mitigation in Multi-Agent Document Intelligence: Domain-Isolated RAG and Dual-Channel Claim Verification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Hallucination in large language model (LLM) pipelines is the central unsolved problem in enterprise document intelligence: models produce fluent, structurally plausible outputs factually unsupported by the source document. Existing mitigation strategies operate within the generative process — retrieval augmentation, self-consistency, chain-of-thought grounding — and reduce but do not eliminate this failure mode. We address hallucination at the architecture level through three mechanisms implemented within MIKA (Multi-modal Intelligent Knowledge Analysis), a five-layer multi-agent platform for intelligent document processing. (C1) Domain-isolated multi-query RAG. Each of MIKA's 12 reasoning agents maintains a strictly isolated vector partition (Kᵢ ∩ Kⱼ = ∅) and issues four independent queries per analytical dimension, eliminating cross-domain retrieval contamination architecturally. A three-stage ablation on KGPPS clinical NLP validates C1: shared-index baseline MAD = 2.16; domain isolation with equivalent content MAD = 1.91; full C1 MAD = 1.58 — isolating a 0.33 causal contribution of partition isolation above content enrichment alone. (C2) Dual-channel claim verification. A deterministic source channel S accesses the source of record independently of the LLM; every structured claim is verified against Ts before delivery — claims without grounding are refused rather than hallucinated. Calibrated on 1,240 documents (8,431 claims), C2 reduces delivered hallucination from 10.0% to 0.29% at 97.1% claim precision. The pattern generalizes to any domain with a machine-accessible ground truth (OCR, ERP, HL7/FHIR, contract text). (C5) Schema-free, training-free extraction. Extraction targets are runtime natural language specifications validated by C2. On 180 documents spanning four heterogeneous types, C5 achieves 91.7% field accuracy vs. 73.8% for a template-based industrial baseline — without retraining.