AI-Assisted Data Extraction with a Large Language Model: A Study Within Reviews
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Data extraction is a critical but error-prone and labor-intensive task in evidence synthesis. Unlike other artificial intelligence (AI) technologies, large language models (LLMs) do not require labeled training data for data extraction.
Objective
To compare an AI-assisted to a traditional y data extraction process.
Design
Study within reviews (SWAR) utilizing a prospective, parallel group comparison with blinded data adjudicators.
Setting
Workflow validation within six ongoing systematic reviews of interventions under real-world conditions.
Intervention
Initial data extraction using an LLM (Claude versions 2.1, 3.0 Opus, and 3.5 Sonnet) verified by a human reviewer.
Measurements
Concordance, time on task, accuracy, recall, precision, and error analysis.
Results
The six systematic reviews of the SWAR contributed 9,341 data elements, extracted from 63 studies. Concordance between the two methods was 77.2%. The accuracy of the AI-assisted approach compared with enhanced human data extraction was 91.0%, with a recall of 89.4% and a precision of 98.9%. The AI-assisted approach had fewer incorrect extractions (9.0% vs. 11.0%) and similar risks of major errors (2.5% vs. 2.7%) compared to the traditional human-only method, with a median time saving of 41 minutes per study. Missed data items were the most frequent errors in both approaches.
Limitations
Assessing the concordance of data extractions and classifying errors required subjective judgment. Tracking time on task consistently was challenging.
Conclusion
The use of an LLM can improve accuracy of data extraction and save time in evidence synthesis. Results reinforce previous findings that human-only data extraction is prone to errors.
Primary Funding Source
US Agency for Healthcare Research and Quality, RTI International
Registration
SWAR28 Gerald Gartlehner (2023 FEB 11 2102).pdf