GPT API Models Can Function as Highly Reliable Second Screeners of Titles and Abstracts in Systematic Reviews

Mikkel Helding Vembye
Julian Christensen
Anja Bondebjerg
Frederikke Lykke Witthöft Schytt

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Independent human double screening of titles and abstracts is a critical step to ensure the quality ofsystematic reviews and meta-analyses herein. However, double screening is a resource-demandingprocedure that decelerates the review process. To alleviate this issue, we evaluated the use ofOpenAI’s GPT API models as an alternative to human second screeners of titles and abstracts. Wedid so by developing a new benchmark scheme for interpreting the performances of automatedscreening tools against common human screening performances in high-quality systematic reviewsand conducting three large-scale experiments on three psychological systematic reviews with different levels of complexity. Across all experiments, we show that the GPT API models can perform on par with and in some cases even better than typical human screening performance in terms of detecting relevant studies while showing high exclusion performance, as well. Hereto, we introduce the use of multi-prompt screening, that is making one concise prompt per inclusion/exclusion criteria in a review, and show that it can be a valuable tool to use for screening in highly complex review settings. To support future reviews, we develop a reproducible workflow and tentative guidelines for when reviewers can or cannot use GPT API models as independent second screeners of titles and abstracts. Moreover, we present the R package AIscreenR to standardize and scale up the suggested application. Our aim is ultimately to make GPT API models acceptable as independent second screeners within high-quality systematic reviews, such as the ones published in Psychological Bulletin.

Version published to 10.31219/osf.io/yrhzm_v3 on OSF Preprints
May 21, 2025
Version published to 10.31219/osf.io/yrhzm_v2 on OSF Preprints
May 16, 2025
Version published to 10.31219/osf.io/yrhzm_v1 on OSF Preprints
Jul 11, 2024

GPT API Models Can Function as Highly Reliable Second Screeners of Titles and Abstracts in Systematic Reviews

This article has 4 authors:
1. Mikkel Helding Vembye
2. Julian Christensen
3. Anja Bondebjerg
4. Frederikke Lykke Witthöft Schytt
This article has no evaluationsLatest version May 21, 2025
Using OpenAI Models for Abstract Screening

This article has 4 authors:
1. Andrew Taylor
2. Josephine Usow
3. Eli Miller
4. Dilay Kalinoglu
This article has no evaluationsLatest version Jun 20, 2025
AIM Review Tool: Artificial Intelligence for Smarter Systematic Review Screening

This article has 15 authors:
1. Sergio Mena
2. Esther Rituerto-Gonzalez
3. Fiona Coutts
4. Jana von Trott
5. Grace Jacobs
6. Linda Bryant
7. Nicoleta Sirbu
8. Liisi Promet
9. Dominic Oliver
10. Muhammad S. Ahmed
11. Paolo Fusar-Poli
12. Marian Bakermans-Kranenburg
13. Marinus H. van IJzendoorn
14. Nikolaos Koutsouleris
15. Paris Alexandros Lalousis
This article has no evaluationsLatest version May 9, 2025

Listed in

Abstract

Article activity feed

Related articles

GPT API Models Can Function as Highly Reliable Second Screeners of Titles and Abstracts in Systematic Reviews

Using OpenAI Models for Abstract Screening

AIM Review Tool: Artificial Intelligence for Smarter Systematic Review Screening