Large Language Models (LLMs) for Evidence Synthesis: An Exploratory Evaluation and A New Approach for Automated Data Extraction

Yuchen Zhang
Nanyu Luo
Hajung Kim
Linxin Li
Linfeng Gao
Jiayi Han
Shiting Chen
Xiaoya Zhang
Jinbo He
Feng Ji

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly used in scientific research for their strong general problem-solving capabilities. Data extraction remains one of the most time- and labor-consuming steps in evidence synthesis (ES), making LLMs a promising tool with improved efficiency and accuracy. Our study evaluates the performance of different LLMs and proposes a novel method, Divide, Conquer, then Recheck (DCR), to optimize for LLM-based data extraction in ES. Multiple LLM foundational models were compared through accuracy, precision, recall, and F1-score. We find that GPT-4o demonstrates notably better performance across most variables compared to ChatPDF, Bing Chat, and GPT-4. The proposed DCR method powered by GPT4-o achieved higher accuracy in most structured data extraction and the few-shot prompting strategy further improved performance on complex information (e.g., correlation coefficient). These findings highlight the potential of using LLMs in ES research.

Version published to 10.31234/osf.io/udysp_v1 on OSF Preprints
Oct 16, 2025

Automating Abstract Screening in Research Synthesis using Large Language Models

This article has 4 authors:
1. Mirka Henninger
2. Jan Radek
3. Jean-Paul Snijder
4. Martin Pauly
This article has no evaluationsLatest version Oct 27, 2025
Using Large Language Models for Text Annotation in Social Science and Humanities: A Hands-On Python/R Tutorial

This article has 3 authors:
1. Qixiang Fang
2. Javier Garcia-Bernardo
3. Erik-Jan van Kesteren
This article has no evaluationsLatest version Nov 13, 2025
Large Language Model for Automated Scientific Hypothesis and Evidence Analysis

This article has 1 author:
1. Daniel Tang
This article has no evaluationsLatest version Oct 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Automating Abstract Screening in Research Synthesis using Large Language Models

Using Large Language Models for Text Annotation in Social Science and Humanities: A Hands-On Python/R Tutorial

Large Language Model for Automated Scientific Hypothesis and Evidence Analysis