Accelerating Systematic Reviews with Large Language Models: Current Practices and Recommendations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study investigates the application of Large Language Models (LLMs) in systematic reviews, emphasizing their performance, consistency, and potential for cost efficiency. Through a systematic lineage search, we included 76 studies from 734 articles. The findings reveal that LLMs demonstrate moderate to high performance in title and abstract screening, full-text screening, and data extraction. However, their performance is notably unstable in the stages of literature search and quality assessment. Prompt design emerges as a crucial factor, with Chain of Thought (CoT) prompts frequently enhancing results. While LLMs exhibit moderate to high agreement with human reviewers in some stages, their consistency in quality assessment remains comparatively lower. The research suggests that while LLMs cannot fully replace human reviewers, they serve as valuable assistants in systematic reviews, especially in reducing time and effort. The study also provides practical recommendations for integrating LLMs effectively and discusses the challenges and future research directions in this evolving field.

Article activity feed