Capability of chatbots powered by large language models to support the screening process of scoping reviews: a feasibility study

Kim Nordmann
Michael Schaller
Stefanie Sauter
Florian Fischer

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The recent surge in publications increases the screening time required to maintain up-to-date and high-quality literature reviews. One of the most time-consuming phases is the screening of titles and abstracts. With the support of machine learning tools, this process has been semi-automated for systematic reviews, with limited success for scoping reviews. ChatGPT, a large language model, might support scoping review screening with its ability to identify key concepts and themes within texts. We hypothesise that ChatGPT’s performance in abstract screening surpasses that of the semi-automated tool Rayyan, increasing efficiency at acceptable costs while maintaining a low type II error. In our retrospective analysis, ChatGPT 4.0 decided upon 15 306 abstracts, vastly outperforming Rayyan. ChatGPT demonstrated high levels of accuracy (68%), specificity (67%) and sensitivity (88–89%) and a negative predictive value of 99% when compared to human researchers’ decisions. The workload savings were at 64% reasonable costs. Despite the promising results, human oversight remains paramount, as ChatGPT’s decisions resulted in a 11% false negative rate. A hybrid screening approach combining human raters and ChatGPT might ensure accuracy and quality while enhancing efficiency. Further research on ChatGPT’s parameters, the prompts and screening scenarios is necessary in order to validate these results and to develop a standardised approach.

Version published to 10.21203/rs.3.rs-4687319/v1 on Research Square
Jul 31, 2024

Exploring the Potential of Large Language Models: Can ChatGPT effectively assume the role of medical professionals by providing accurate and reliable responses in childhood cancer?

This article has 10 authors:
1. Kongkong Cui
2. Jie Lin
3. Zaihong Hu
4. Peng Hong
5. Zhiqiang Gao
6. Xiaomao Tian
7. Yu Wang
8. Feng Liu
9. Guanghui Wei
10. Qinlin Shi
This article has no evaluationsLatest version Jul 19, 2024
Assessing the Response Strategies of Large Language Models Under Uncertainty: A Comparative Study Using Prompt Engineering

This article has 2 authors:
1. Nehoda Lainwright
2. Moyat Pemberton
This article has no evaluationsLatest version Aug 1, 2024
Assessing ChatGPT 4.0’s Capabilities in The United Kingdom Medical Licensing Examination (UKMLA): A Robust Categorical Analysis

This article has 8 authors:
1. Octavi Casals-Farre
2. Ravanth Baskaran
3. Aditya Singh
4. Harmeena Kaur
5. Tazim Ul Hoque
6. Andreia Almeida
7. Marcus Coffey
8. Athanasios Hassoulas
This article has no evaluationsLatest version Jul 24, 2024

Listed in

Abstract

Article activity feed

Related articles

Exploring the Potential of Large Language Models: Can ChatGPT effectively assume the role of medical professionals by providing accurate and reliable responses in childhood cancer?

Assessing the Response Strategies of Large Language Models Under Uncertainty: A Comparative Study Using Prompt Engineering

Assessing ChatGPT 4.0’s Capabilities in The United Kingdom Medical Licensing Examination (UKMLA): A Robust Categorical Analysis