Using Elicit AI research assistant for data extraction in systematic reviews: a feasibility study across environmental and life sciences

Malgorzata Lagisz
Ayumi Mizuno
Kyle Morrison
Pietro Pollo
Lorenzo Ricolfi
Yefeng Yang
Shinichi Nakagawa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Data extraction in systematic reviews, maps and meta-analyses is time-consuming and prone to human error or subjective judgment. Large Language Models offer potential for automating this process, yet their performance has been evaluated in a limited range of platforms, disciplines, and review types. We assessed the performance of the Elicit platform across diverse data extraction tasks using journal articles from seven systematic-like reviews in life and environmental sciences. Human-extracted data served as the gold standard. For each review, we used eight articles for prompt development and another eight for testing. Initial prompts were iteratively refined to exceed 87% accuracy or up to five rounds. We then tested extraction accuracy, reproducibility across user accounts, and the effect of Elicit’s high-accuracy mode. Of 90 considered prompts, 70 exceeded the 87% accuracy when compared to gold standard values but tended to be lower when tested on a new set of articles. Repeating data extractions with different Elicit user accounts resulted in 90% agreement on extracted values, though supporting quotes and reasoning matched in only 46% and 30% of cases, respectively. In high-accuracy mode, value matches dropped to 77%, with just 10% quote matches and 0% reasoning matches. Extraction accuracy did not differ by data types. Elicit also helped identify eight (<1%) errors in the gold standard data. Our results show that Elicit can complement, but not replace, human data extractors. Elicit may be best used as a secondary reviewer and to evaluate the clarity of data extraction protocols. Prompts must be fine-tuned and independently validated.

Version published to 10.32942/x2f346
Aug 11, 2025

The efficiency and accuracy of Artificial Intelligence in conducting systematic reviews: A single case analysis

This article has 5 authors:
1. Katherine Mok
2. Daniel Gan
3. Annabel Burnside
4. Caroline Gao
5. Kate Filia
This article has no evaluationsLatest version Sep 11, 2025
Leveraging Large Language Models for Data Extraction in Metaresearch

This article has 4 authors:
1. Benjamin Simsa
2. Artem Buts
3. Ivan Ropovik
4. Matus Adamkovic
This article has no evaluationsLatest version Aug 19, 2025
Building a sample characteristics table for studies on peer-reviewed publications: Rationale and pipeline development

This article has 9 authors:
1. Stephanie Zellers
2. Kaitlyn Hair
3. Rachel Heyard
4. Sean Smith
5. Maia Salholz-Hillel
6. Katie Hannah Thomson
7. Kimberley E. Wever
8. Emily Sena
9. Tracey Lynn Weissgerber
This article has no evaluationsLatest version Sep 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The efficiency and accuracy of Artificial Intelligence in conducting systematic reviews: A single case analysis

Leveraging Large Language Models for Data Extraction in Metaresearch

Building a sample characteristics table for studies on peer-reviewed publications: Rationale and pipeline development