Augmenting Large Language Models with External Data Sources: A Systematic Review of Methodologies, Performance Metrics, and Information Fidelity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) have materialised as revolutionary tools across various do- mains, showcasing exceptional capabilities in natural language processing and generation. However, their reliance on static pre-training data limits their ability to access up-to-date and domain-specific information. The existing research often treats augmentation strategies in isolation, and limited efforts have been made to systematically compare them through the lens of information integrity. This review focuses specifically on Retrieval-Augmented Generation (RAG) and Fine-tuning, identifying them as the two dominant paradigms for integrating external knowledge: RAG for retrieval-based context injection and Fine-tuning for parametric knowledge adaptation. While existing surveys predominantly focus on performance metrics like accuracy or latency, this paper addresses the critical gap of data fidelity-the preservation of truthfulness, integrity, and fairness during augmentation. We systematically synthesise empirical findings from diverse methodologies to determine how each approach mitigates hallucinations and bias. By comparing the trade-offs between retrieval-based context injection and parametric knowledge adaptation, this survey brings unique value to readers by providing a structured taxonomy, a unified evaluation frame- work, and actionable insights to guide future research and practical deployment of robust, high-fidelity LLMs.