Accelerating Systematic Reviews with Large Language Models: Current Practices and Recommendations

Bo HU
Liuna Geng

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study investigates the application of Large Language Models (LLMs) in systematic reviews, emphasizing their performance, consistency, and potential for cost efficiency. Through a systematic lineage search, we included 76 studies from 734 articles. The findings reveal that LLMs demonstrate moderate to high performance in title and abstract screening, full-text screening, and data extraction. However, their performance is notably unstable in the stages of literature search and quality assessment. Prompt design emerges as a crucial factor, with Chain of Thought (CoT) prompts frequently enhancing results. While LLMs exhibit moderate to high agreement with human reviewers in some stages, their consistency in quality assessment remains comparatively lower. The research suggests that while LLMs cannot fully replace human reviewers, they serve as valuable assistants in systematic reviews, especially in reducing time and effort. The study also provides practical recommendations for integrating LLMs effectively and discusses the challenges and future research directions in this evolving field.

Version published to 10.31219/osf.io/wm2av on OSF Preprints
Oct 30, 2024

Human Researchers are Superior to Large Language Models in Writing a Systematic Review in a Comparative Multitask Assessment

This article has 8 authors:
1. Martina Sollini
2. Cristiano Pini
3. Alexandra Lazar
4. Fabrizia Gelardi
5. Gaia Ninatti
6. Matteo Bauckneht
7. Arturo Chiti
8. Margarita Kirienko
This article has no evaluationsLatest version Jun 13, 2025
Automation of Systematic Reviews with Large Language Models

This article has 33 authors:
1. Christian Cao
2. Rohit Arora
3. Paul Cento
4. Katherine Manta
5. Elina Farahani
6. Matthew Cecere
7. Anabel Selemon
8. Jason Sang
9. Ling Xi Gong
10. Robert Kloosterman
11. Scott Jiang
12. Richard Saleh
13. Denis Margalik
14. James Lin
15. Jane Jomy
16. Jerry Xie
17. David Chen
18. Jaswanth Gorla
19. Sylvia Lee
20. Kelvin Zhang
21. Harriet Ware
22. Mairead Whelan
23. Bijan Teja
24. Alexander A. Leung
25. Lina Ghosn
26. Rahul K. Arora
27. Allen S. Detsky
28. Michael Noetel
29. David B. Emerson
30. Isabelle Boutron
31. David Moher
32. George Church
33. Niklas Bobrovitz
This article has no evaluationsLatest version Jun 19, 2025
Using OpenAI Models for Abstract Screening

This article has 4 authors:
1. Andrew Taylor
2. Josephine Usow
3. Eli Miller
4. Dilay Kalinoglu
This article has no evaluationsLatest version Jun 20, 2025

Listed in

Abstract

Article activity feed

Related articles

Human Researchers are Superior to Large Language Models in Writing a Systematic Review in a Comparative Multitask Assessment

Automation of Systematic Reviews with Large Language Models

Using OpenAI Models for Abstract Screening