Task Aware Retrieval Selection Mechanisms for Large Language Model Reasoning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Most retrieval augmented generation pipelines rely on a fixed retrieval configuration across tasks, despite the fact that different reasoning problems benefit from distinct retrieval depth, granularity, and timing. This work introduces task aware retrieval selection mechanisms that adapt retrieval behavior to the characteristics of each query and downstream objective. We design a two level controller: a query level policy network that predicts retrieval depth, document granularity, and whether to perform single shot or iterative retrieval, and a step level policy that can trigger additional retrieval based on intermediate model uncertainty. Both policies are parameterized by lightweight transformers that consume query embeddings, preliminary generation traces, and calibration features such as entropy and self consistency variance. We train the policies with an off policy contextual bandit objective using logged interactions from 1.2 million queries covering factoid QA, multi hop reasoning, and code explanation tasks. When plugged into a standard RAG pipeline with a 13B language model, the task aware mechanism improves overall Exact Match by 3.7 points and reasoning success rate by 5.4 points on StrategyQA, HotpotQA, and GSM8K style synthetic math QA, while reducing average retrieval calls by 18.2%. Detailed analysis shows that the policy learns to avoid unnecessary retrieval for simple questions and to allocate more retrieval budget to compositional and numerically intensive problems.

Article activity feed