Multi Source Context Integration through Lightweight Reconstruction for Retrieval Augmented Generation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Real world retrieval augmented generation systems increasingly draw evidence from heterogeneous sources such as web indices, vector databases, code repositories, and structured tables. Naive concatenation of multi source outputs often leads to excessively long contexts and conflicting signals. We propose a lightweight multi source context integration framework that reconstructs a unified input representation using minimal additional parameters. The system first applies source specific encoders to produce dense passage representations and uncertainty scores. A gating based selector then chooses a small subset of passages across all sources under a global context budget, optimizing a differentiable objective that trades off source diversity and estimated utility. Selected passages are fed into a low rank adapter equipped transformer that performs cross source interaction and produces a reconstructed context sequence for the base large language model. Our implementation adds fewer than 3% additional parameters to a 13B model. Evaluations on a mixed benchmark including KILT, CodeSearchNet QA, and a proprietary table QA dataset with 120k queries show that the proposed method increases overall answer F1 by 4.9 points compared to single source RAG and by 3.2 points compared to simple multi source concatenation, while reducing average context tokens by 29.4%. The gains are most pronounced on queries requiring both unstructured text and structured evidence, highlighting the importance of principled multi source integration.

Article activity feed