RAG Systems for Academic Research: Towards intelligent, secure and effective document management

Davide Richard Bramley

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The integration of large language models (LLMs) in academic and research contexts raises questions about their effectiveness in processing texts, audio transcriptions and complex documents, about decontextualization, hallucinations or reliability of responses. The adoption of local RetrievalAugmented Generation (RAG) systems represents a strategic solution to mitigate risks related to security, privacy and decontextualization of information, ensuring greater control over the processed contents and the provenance of sources. However, it remains unclear which model is most suitable to meet the needs of a researcher within a local environment. This study introduces the Multimodal Evaluation Framework for LLM (MEFL), a methodological model aimed at testing the performance of different LLMs integrated into AnythingLLM, a platform that can be freely installed and used without the need for a network connection. The experimental protocol is divided into five phases: (1) definition of the evaluation criteria; (2) selection and configuration of the models within the local environment; (3) construction of a representative dataset with academic and media materials; (4) performance measurement on quantitative and qualitative metrics; (5) comparative analysis to identify the most effective solutions in the research context. The results highlight how the integration between LLM and local RAG can offer significant advantages in the management of contextualized knowledge, improving the quality of answers and reducing hallucinations.

Version published to 10.35542/osf.io/9r2qu_v1 on OSF Preprints
Mar 20, 2026

Knowledge Grounded Conversational Access to Heterogeneous Institutional Documents via OCR Enabled Hybrid RAG

This article has 6 authors:
1. Veerababu Reddy
2. Krishnaveni Kaki
3. Bharath Kesineni
4. Iswarya Annapureddy
5. Harini Kanna
6. Gargesh Chalamala
This article has no evaluationsLatest version Apr 2, 2026
Augmenting Large Language Models with External Data Sources: A Systematic Review of Methodologies, Performance Metrics, and Information Fidelity

This article has 4 authors:
1. Soham Mukherjee
2. John Le
3. Chau Nguyen
4. Thai Vu
This article has no evaluationsLatest version Apr 10, 2026
From Product to Process: A Framework and Practical Toolkit for AI-Aware University Assessment

This article has 3 authors:
1. Luis F. Rivera-Galicia
2. Mónica Giménez-Baldazo
3. Carlos Mir-Fernández
This article has no evaluationsLatest version Apr 15, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Knowledge Grounded Conversational Access to Heterogeneous Institutional Documents via OCR Enabled Hybrid RAG

Augmenting Large Language Models with External Data Sources: A Systematic Review of Methodologies, Performance Metrics, and Information Fidelity

From Product to Process: A Framework and Practical Toolkit for AI-Aware University Assessment