From Inference-Time Routing to Ingestion-Time Graphs: Referential Discovery, Actor-Agent Parallelism, and Formal Completeness Guarantees for Deterministic Multi-Hop RAG

Ruben Jaime

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The document dependency graph is a property of the corpus, not of the query. This observation—simple in retrospect, absent in the literature—is the foundation of this paper. Because the dependency structure between documents exists independently of any query, it can be pre-computed at ingestion time. Once pre-computed, multi-hop retrieval becomes a deterministic graph traversal with formal completeness guarantees, eliminating the need for autonomous LLM agents to make routing decisions at inference time. This paper introduces Referential Discovery, an incremental ingestion pattern that pre-computes cross-document chunk references via a two-pass procedure and a pending-queue mechanism for forward references, reducing multi-hop resolution from O(bD · logNidx) to O(|Vc|) at inference time. Referential Discovery enables Controlled Multi-Hop RAG — a deterministic, auditable four-stage pipeline im plementing a parallel multi-agent architecture where each Ray actor operates as a formally defined Actor-Agent [29] with plan-based coordination, achieving full dependency-chain traversal without delegating routing to an autonomous LLM loop. Infrastructure homogeneity is preserved throughout: the dependency graph is stored in PostgreSQL, the same persistence layer already required by the base system, introducing zero additional infrastructure. A multi-tenant production evaluation across six independent HR technology or ganizations — comprising 5,169 real-world curriculum vitae documents and 500 concurrent queries at three complexity levels — demonstrates full dependency-chain traversal in under 62 minutes wall-clock time (t¯< 7.44s/query), with deterministic behavior and full step-level auditability. Domain practitioner evaluation by profes sional recruiters at each tenant organization confirmed zero hallucinations across all 500 responses — a result explained not by measurement luck but by architectural construction: if the dependency graph is complete, retrieval is complete; if retrieval is complete, hallucination is architecturally impossible. Consistent results across six independent corpora demonstrate cross-tenant generalizability. Prompt injection, unbounded tool execution, and EU AI Act non-compliance — structural risks of autonomous agentic systems— are eliminated by design.

Version published to 10.21203/rs.3.rs-9334226/v1 on Research Square
Apr 16, 2026

ConsultChain: Progressive Context Distillation Across Heterogeneous LLM Fleets for Token-Optimal Inference

This article has 1 author:
1. Samuel Edusa
This article has no evaluationsLatest version Apr 13, 2026
Exact Pattern-Aware Extraction for Equality Saturation via Bounded-Depth Tree Covering

This article has 3 authors:
1. Zi Cheng
2. Mengting Yuan
3. Lefei Zhang
This article has no evaluationsLatest version May 11, 2026
Remote-Capable Knowledge Work Should Default to AI-Enabled Flexibility

This article has 6 authors:
1. Chaoyue He
2. Xin Zhou
3. Di Wang
4. Hong Xu
5. Wei Liu
6. Chunyan Miao
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

ConsultChain: Progressive Context Distillation Across Heterogeneous LLM Fleets for Token-Optimal Inference

Exact Pattern-Aware Extraction for Equality Saturation via Bounded-Depth Tree Covering

Remote-Capable Knowledge Work Should Default to AI-Enabled Flexibility