MedRAGent: An Automatic Literature Retrieval and Screening System Utilizing Large Language Models with Retrieval-Augmented Generation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Systematic reviews play a critical role in synthesizing evidence across numerous studies, providing a foundation for informed decision-making in medical practice. However, the process is resource-intensive, requiring proficiency in constructing Boolean queries and screening extensive literature, which are time-consuming and susceptible to inconsistencies, especially for non-expert researchers. While large language models (LLMs) offer a potential solution, their tendency to generate inaccurate or hallucinated content restricts their direct application in systematic reviews.
Objective
This study introduces and evaluates MedRAGent, a novel system that integrates LLMs with retrieval-augmented generation (RAG), designed to automate and enhance the efficiency and accuracy of Boolean query formulation and title/abstract screening in systematic reviews.
Methods
MedRAGent employs DeepSeek-V3-0324 and Kimi-K2-0711-preview LLMs within an RAG framework tailored for PubMed. The system utilizes the official Medical Subject Headings (MeSH) database to construct precise Boolean queries. For screening, it employs the LLMs with a structured prompt to automatically evaluate the relevance of retrieved articles based on predefined inclusion and exclusion criteria. Its performance was assessed using 53,054 articles from 6 research topics.
Results
Our results showed that MedRAGent achieved an overall precision of 0.0271, recall of 0.8308, and F1-score of 0.0525 in Boolean query construction. For automated literature screening, the system attained an overall sensitivity of 0.8131, specificity of 0.9891, and G-mean of 0.8968 when using DeepSeek-V3-0324 as the underlying LLM. Performance improved when using Kimi-K2-0711-preview, with sensitivity of 0.8582, specificity of 0.9919, and G-mean of 0.9226. It efficiently processed 4,000-7,000 articles per day at low operational cost.
Conclusions
MedRAGent demonstrates strong potential for automating Boolean query construction and abstract-level screening in systematic reviews. It effectively accelerates literature processing, supporting researchers in conducting efficient and evidence-based medical reviews.