RAG Systems for Academic Research: Towards intelligent, secure and effective document management
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The integration of large language models (LLMs) in academic and research contexts raises questions about their effectiveness in processing texts, audio transcriptions and complex documents, about decontextualization, hallucinations or reliability of responses. The adoption of local RetrievalAugmented Generation (RAG) systems represents a strategic solution to mitigate risks related to security, privacy and decontextualization of information, ensuring greater control over the processed contents and the provenance of sources. However, it remains unclear which model is most suitable to meet the needs of a researcher within a local environment. This study introduces the Multimodal Evaluation Framework for LLM (MEFL), a methodological model aimed at testing the performance of different LLMs integrated into AnythingLLM, a platform that can be freely installed and used without the need for a network connection. The experimental protocol is divided into five phases: (1) definition of the evaluation criteria; (2) selection and configuration of the models within the local environment; (3) construction of a representative dataset with academic and media materials; (4) performance measurement on quantitative and qualitative metrics; (5) comparative analysis to identify the most effective solutions in the research context. The results highlight how the integration between LLM and local RAG can offer significant advantages in the management of contextualized knowledge, improving the quality of answers and reducing hallucinations.