Towards Explainable RAG: Interpreting the Influence of Retrieved Passages on Generation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Augmented Generation (RAG) models have demonstrated substantial improvements in natural language processing by incorporating external knowledge via document retrieval. However, their interpretability remains limited. Users cannot easily understand how specific retrieved documents contribute to the generated response. In this paper, we propose a framework that improves transparency in RAG by analyzing and interpreting the influence of retrieved documents. We focus on three key components: user embedding profiling, custom reward function design, and a soft prompt compression mechanism. Through comprehensive experiments using benchmark datasets, we introduce new evaluation metrics to assess source attribution and influence alignment. Our findings suggest that interpretability can be meaningfully improved without sacrificing generation quality.