The efficiency and accuracy of Artificial Intelligence in conducting systematic reviews: A single case analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
ObjectivesArtificial Intelligence (AI) tools present an opportunity to expedite the typically lengthy process of systematic reviews and meta-analyses, but more evidence is required on their performance in practice. This paper examined the use of ASReview and ChatGPT for screening, data extraction, and quality ratings compared against a traditional systematic review to explore potential efficiency, accuracy of information, and consistency of judgements compared to human reviewers. MethodsThree screening simulations were conducted using ASReview with different amounts of training data (1, 3, or 5 irrelevant/relevant records). A standardised set of prompts were developed and uploaded to ChatGPT for data extraction and coded for accuracy against the primary studies. For quality ratings, Cochrane’s guidance for the Risk of Bias 2.0 and Risk of Bias in Non-Randomised Studies of Interventions tools were provided to ChatGPT, with their respective templates.ResultsIn all simulations ASReview prioritised relevant studies within the first 800 records (approximately 17% of the dataset). When extracting, ChatGPT sometimes omitted information, though further detail was provided with additional prompting. Few instances of inaccurate information were observed. Consistency of quality ratings was low to moderate, depending on the domain of bias. ConclusionsAI tools can reduce the time required for screening through effective prioritisation of relevant articles and might also support data extraction by quickly locating relevant information in a manuscript. They should, however, be approached with caution for more complex tasks (e.g., quality ratings). In any case, the use of AI requires careful testing and validation of outputs.