Accelerating Systematic Reviews with Large Language Models: Current Practices and Recommendations
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study investigates the application of Large Language Models (LLMs) in systematic reviews, emphasizing their performance, consistency, and potential for cost efficiency. Through a systematic lineage search, we included 76 studies from 734 articles. The findings reveal that LLMs demonstrate moderate to high performance in title and abstract screening, full-text screening, and data extraction. However, their performance is notably unstable in the stages of literature search and quality assessment. Prompt design emerges as a crucial factor, with Chain of Thought (CoT) prompts frequently enhancing results. While LLMs exhibit moderate to high agreement with human reviewers in some stages, their consistency in quality assessment remains comparatively lower. The research suggests that while LLMs cannot fully replace human reviewers, they serve as valuable assistants in systematic reviews, especially in reducing time and effort. The study also provides practical recommendations for integrating LLMs effectively and discusses the challenges and future research directions in this evolving field.