Large Language Models (LLMs) for Evidence Synthesis: An Exploratory Evaluation and A New Approach for Automated Data Extraction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) are increasingly used in scientific research for their strong general problem-solving capabilities. Data extraction remains one of the most time- and labor-consuming steps in evidence synthesis (ES), making LLMs a promising tool with improved efficiency and accuracy. Our study evaluates the performance of different LLMs and proposes a novel method, Divide, Conquer, then Recheck (DCR), to optimize for LLM-based data extraction in ES. Multiple LLM foundational models were compared through accuracy, precision, recall, and F1-score. We find that GPT-4o demonstrates notably better performance across most variables compared to ChatPDF, Bing Chat, and GPT-4. The proposed DCR method powered by GPT4-o achieved higher accuracy in most structured data extraction and the few-shot prompting strategy further improved performance on complex information (e.g., correlation coefficient). These findings highlight the potential of using LLMs in ES research.

Article activity feed