RCT-MARS: When Per-Query Retrieval Routing Fails, and What It Takes to Succeed

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

An oracle that selects the best retrieval paradigm per query substantially outperforms any fixed method, yet learned routers consistently fail to beat strong baselines---the routing paradox . We study this paradox across five retrieval paradigms (BM25, dense embeddings, knowledge-graph retrieval, agentic multi-step search, and cross-encoder reranking) and six benchmarks. Hard routing (discrete paradigm selection) fails severely: the best hard router (XGBoost-Direct, 0.387) falls 7.8 pp below the best fixed paradigm (Reranker, 0.465) despite an oracle ceiling of 0.599, closing \((-57.9%)\) of the oracle gap. We then show what it takes to succeed : Corpus-Aware Soft Routing (CASR) replaces discrete selection with learned per-query fusion weights via XGBoost multi-output regression and temperature-scaled softmax (\((\tau = 0.05)\)). CASR achieves 0.487 nDCG@10, significantly outperforming Dense (\((+4.9)\) pp, \((p = 0.002)\)), numerically exceeding the best fixed paradigm (Reranker, \((+2.2)\) pp, \((p = 0.112)\)), and matching unsupervised fusion (RRF, 0.482; \((p = 0.755)\))---closing \((+16.4%)\) of the oracle gap versus \((-57.9%)\) for hard routing---a 74 percentage-point swing (\((+16.4 - (-57.9))\)). To diagnose why hard routing fails and soft routing succeeds, RCT-MARS (Retrieval Complexity Taxonomy through Multi-paradigm Algorithm Routing and Selection) constructs performance-signature vectors, clusters them to discover three stable complexity classes (ARI \((= 0.773)\), 95% CI \(([0.529, 0.991])\)), and uses the taxonomy as a diagnostic lens. The key insight is that discrete paradigm selection requires precise information that query features alone cannot provide, whereas soft routing hedges across paradigms, tolerating prediction uncertainty while exploiting paradigm complementarity.

Article activity feed