Automating Abstract Screening in Research Synthesis using Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Screening abstracts is a crucial yet labor-intensive step in research synthesis projects such as systematic reviews and meta-analyses. Large Language Models (LLMs) promise an opportunity to streamline and automate this process. However, there is currently little experience or practical insight into how such automated workflows can be implemented in research practice. In this article, we illustrate an LLM-based abstract screening workflow using a set of human-rated abstracts from a recent meta-analysis. We describe how we developed and evaluated different prompting strategies and structured output formats, compared the performance of multiple LLMs, quantified model uncertainty, and automated the entire workflow within the R environment. We also provide R scripts and implementation guidance to support psychological researchers in adopting LLM-based workflows for research synthesis. Our comparisons show how different types of LLMs vary in accuracy relative to human raters, and how prompting strategies and hyperparameter settings affect model performance and uncertainty. We demonstrate that LLM-assisted screening can substantially reduce the time and cost of review preparation while maintaining accuracy comparable to human raters. At the same time, we emphasize that this work represents an initial step, and that continued refinement and validation are essential as LLM technologies and their applications continue to evolve rapidly.