Using Elicit AI research assistant for data extraction in systematic reviews: a feasibility study across environmental and life sciences

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Data extraction in systematic reviews, maps and meta-analyses is time-consuming and prone to human error or subjective judgment. Large Language Models offer potential for automating this process, yet their performance has been evaluated in a limited range of platforms, disciplines, and review types. We assessed the performance of the Elicit platform across diverse data extraction tasks using journal articles from seven systematic-like reviews in life and environmental sciences. Human-extracted data served as the gold standard. For each review, we used eight articles for prompt development and another eight for testing. Initial prompts were iteratively refined to exceed 87% accuracy or up to five rounds. We then tested extraction accuracy, reproducibility across user accounts, and the effect of Elicit’s high-accuracy mode. Of 90 considered prompts, 70 exceeded the 87% accuracy when compared to gold standard values but tended to be lower when tested on a new set of articles. Repeating data extractions with different Elicit user accounts resulted in 90% agreement on extracted values, though supporting quotes and reasoning matched in only 46% and 30% of cases, respectively. In high-accuracy mode, value matches dropped to 77%, with just 10% quote matches and 0% reasoning matches. Extraction accuracy did not differ by data types. Elicit also helped identify eight (<1%) errors in the gold standard data. Our results show that Elicit can complement, but not replace, human data extractors. Elicit may be best used as a secondary reviewer and to evaluate the clarity of data extraction protocols. Prompts must be fine-tuned and independently validated.

Article activity feed