To Include or Not to Include? A prescription from the pharmacy on how to use active learning assisted screening in systematic reviews

Rinus G Verdonschot
Tinne Dilles
Caitriona Cahir
Marjan De Graef
Renata Vesela Holis
Juliane Frydenlund
Petra Denig
Tamasine Grimes
Fatma G Karapinar-Carkit
Marieke Schor

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Systematic reviews are critical for evidence-based decision-making but require significant manual effort during the screening stage, which is labor- intensive and prone to error. Active learning (AL)-assisted screening tools have emerged to address these challenges. However, guidance for using AL-assisted screen- ing in systematic reviews - especially those employing broad search strategies with heterogeneous results - is limited. This study aims to assess the effectiveness and reliability of AL-assisted screening for large, heterogeneous datasets. Specifically, it evaluates the comprehensiveness and necessity of the recommended SAFE proce- dure, examines the influence of different labeling strategies, and investigates whether AL-assisted screening can aid in reducing manual screening errors. Methods: Screening of four large, heterogeneous datasets from medication man- agement systematic reviews was simulated using ASReview. The datasets ranged from 3475 to 16218 records. For these datasets 0.08 to 1% of records were included in the final systematic review. Our simulations systematically varied all parameters defined by the SAFE procedure. Recall versus sampling behavior was analyzed, with a focus on the impact of parameter choices on retrieving records selected for full text inclusions and on reducing the number of records to be screened. Results: AL-assisted screening can effectively reduce the number of records to screen by almost 90% without increasing the risk of missing relevant records in com- parison to manual screening. For three of our datasets, the best performance (100% recall of full text includes and 89-90% reduction in the number of records to screen) is achieved when using the SAFE procedure in combination with the elas-u4 and elas-h3 models and full text labeling. This choice of parameters results in only 87% recall of full text includes for the remaining dataset (16218 records, 0.6% title/ab- stract includes, 0.08% full text includes). For this dataset, the best performance (100% recall, 90% screening reduction) is achieved when using the SAFE procedure with the simpler Naive Bayes model and TF-IDF feature extractor and title/abstract labeling. Conclusions: AL-assisted screening can safely and effectively reduce the workload needed to screen the large, heterogeneous datasets common in medication management systematic reviews. We recommend the modified SAFE procedure using full-text labels and the elas models. If the estimated ratio of full text includes is very low, it may be more appropriate to use the original SAFE procedure with title/abstract labeling.

Version published to 10.1101/2025.09.26.25336705 on medRxiv
Sep 28, 2025

Updated Approach to Error Rates in Systematic Review Screening: Integrating Active Learning, Large Language Models, and Full-Text Screening Data

This article has 5 authors:
1. Rutger Chris Neeleman
2. Berke Yazan
3. Emily Westerbeek
4. Wouter van Ballegooijen
5. Rens van de Schoot
This article has no evaluationsLatest version Jan 26, 2026
How can we best communicate the findings of public health-related systematic reviews? A Study Within a Review (SWAR)

This article has 7 authors:
1. Niamh Gildernew
2. Mike Clarke
3. Miriam Brazzelli
4. Mari Imamura
5. Clare Robertson
6. Gianni Virgili
7. Sinead Noelle Duggan
This article has no evaluationsLatest version Dec 22, 2025
Screenathon 2.0: Human–AI Collaborative Screening Applied to Patient-Generated Health Data

This article has 11 authors:
1. Jonas Bergmann
2. Tiago Azzi
3. Rutger Chris Neeleman
4. Kianush Monschau
5. Elena Jalsovec
6. Emily Westerbeek
7. Felix Weijdema
8. Jonathan de Bruin
9. Qixiang Fang
10. Rens van de Schoot
11. Berke Yazan
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Updated Approach to Error Rates in Systematic Review Screening: Integrating Active Learning, Large Language Models, and Full-Text Screening Data

How can we best communicate the findings of public health-related systematic reviews? A Study Within a Review (SWAR)

Screenathon 2.0: Human–AI Collaborative Screening Applied to Patient-Generated Health Data