Large Language Models in Systematic Review Screening: Opportunities, Challenges, and Methodological Considerations
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Systematic reviews require labor-intensive screening processes—an approach prone to bottlenecks, delays, and scalability constraints in large-scale reviews. Large Language Models (LLMs) have recently emerged as a powerful alternative, capable of operating in zero-shot or few-shot modes to classify abstracts according to predefined criteria without requiring continuous human intervention like semi-automated platforms. This review focuses on the central challenges that users in the biomedical field encounter when integrating LLMs—such as GPT-4—into evidence-based research. It examines critical requirements for software and data preprocessing, discusses various prompt strategies, and underscores the continued need for human oversight to maintain rigorous quality control. By drawing on current practices for cost management, reproducibility, and prompt refinement, this article highlights how review teams can substantially reduce screening workloads without compromising the comprehensiveness of evidence-based inquiry. The findings presented aim to balance the strengths of LLM-driven automation with structured human checks, ensuring that systematic reviews retain their methodological integrity while leveraging the efficiency gains made possible by recent advances in artificial intelligence.