A Prompt-Based Tutorial for Large Language Model–Assisted Screening in Systematic Reviews and Meta-Analyses

Tongyu Qiu
Huangqi Jiang
Tao Lin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Title/abstract and full-text screening are among the most time-consuming stages of systematic reviews. Large language models (LLMs) such as ChatGPT can assist in screening, but most prior evaluations focus only on abstract-level decisions and require coding or API access, limiting practical application. This study aimed to develop and evaluate a structured, prompt-based approach that enables LLMs to perform both abstract and full-text screening without programming or API integration. Using datasets from two completed meta-analyses, we implemented a stepwise training framework involving comprehension checks, criterion-specific feedback, and iterative prompt refinement. Study 1 optimized technical parameters, including file format and batch size, using 1,000 abstracts and 50 full texts from a completed meta-analysis. Study 2 validated the approach in an independent dataset on teletherapy for depression (1,321 abstracts, 82 full texts). Human reviewers’ decisions served as the reference standard, and sensitivity, specificity, and accuracy were the primary outcomes. In Study 1, the LLM achieved 98.0% sensitivity and 80.6% specificity at abstract screening, with optimal performance using batches of 50 abstracts in plain-text format. In Study 2, sensitivity reached 100% and specificity 85.6% for abstracts, and 82.1% accuracy for full-text screening, correctly retaining all eligible studies. A structured, prompt-based approach allows LLMs to approximate human-level accuracy in both abstract and full-text screening, with high sensitivity and specificity. This method makes LLM-assisted screening more accessible to review teams. While human oversight remains essential to address false positives and ensure rigor, prompt-based LLM workflows can substantially reduce reviewer burden and accelerate evidence synthesis.

Version published to 10.31234/osf.io/m5rab_v1 on OSF Preprints
Oct 31, 2025

Dual-Model LLM Ensemble via Web Chat Interfaces Reaches Near-Perfect Sensitivity for Systematic-Review Screening: A Multi-Domain Validation with Equivalence to API Access

This article has 7 authors:
1. Petter Fagerberg
2. Oscar Sallander
3. Kim Vikhe Patil
4. Anders Berg
5. Anastasia Nyman
6. Natalia Borg
7. Thomas Lindén
This article has no evaluationsLatest version Nov 6, 2025
Automating Abstract Screening in Research Synthesis using Large Language Models

This article has 4 authors:
1. Mirka Henninger
2. Jan Radek
3. Jean-Paul Snijder
4. Martin Pauly
This article has no evaluationsLatest version Oct 27, 2025
A systematic review of trial-matching pipelines using large language models

This article has 3 authors:
1. Braxton A. Morrison
2. Madhumita Sushil
3. Jacob S. Young
This article has no evaluationsLatest version Nov 20, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Dual-Model LLM Ensemble via Web Chat Interfaces Reaches Near-Perfect Sensitivity for Systematic-Review Screening: A Multi-Domain Validation with Equivalence to API Access

Automating Abstract Screening in Research Synthesis using Large Language Models

A systematic review of trial-matching pipelines using large language models