Augmenting Large Language Models with External Data Sources: A Systematic Review of Methodologies, Performance Metrics, and Information Fidelity

Soham Mukherjee
John Le
Chau Nguyen
Thai Vu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models (LLMs) have materialised as revolutionary tools across various do- mains, showcasing exceptional capabilities in natural language processing and generation. However, their reliance on static pre-training data limits their ability to access up-to-date and domain-specific information. The existing research often treats augmentation strategies in isolation, and limited efforts have been made to systematically compare them through the lens of information integrity. This review focuses specifically on Retrieval-Augmented Generation (RAG) and Fine-tuning, identifying them as the two dominant paradigms for integrating external knowledge: RAG for retrieval-based context injection and Fine-tuning for parametric knowledge adaptation. While existing surveys predominantly focus on performance metrics like accuracy or latency, this paper addresses the critical gap of data fidelity-the preservation of truthfulness, integrity, and fairness during augmentation. We systematically synthesise empirical findings from diverse methodologies to determine how each approach mitigates hallucinations and bias. By comparing the trade-offs between retrieval-based context injection and parametric knowledge adaptation, this survey brings unique value to readers by providing a structured taxonomy, a unified evaluation frame- work, and actionable insights to guide future research and practical deployment of robust, high-fidelity LLMs.

Version published to 10.20944/preprints202604.0717.v1
Apr 10, 2026

Synthetic Participants Generated by Large Language Models: A Systematic Literature Review

This article has 3 authors:
1. Eduard Kuric
2. Peter Demcak
3. Matus Krajcovic
This article has no evaluationsLatest version Mar 10, 2026
Large Language Models for Material Science: A Systematic Review

This article has 2 authors:
1. Cecília Coelho
2. Oliver Niggemann
This article has no evaluationsLatest version Apr 14, 2026
Natural Language Processing in the Era of Large Language Models: Foundations, Integration, and Low-Resource Frontiers

This article has 1 author:
1. Monisha Gottam
This article has no evaluationsLatest version Mar 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Synthetic Participants Generated by Large Language Models: A Systematic Literature Review

Large Language Models for Material Science: A Systematic Review

Natural Language Processing in the Era of Large Language Models: Foundations, Integration, and Low-Resource Frontiers