To Include or Not to Include? A prescription from the pharmacy on how to use active learning assisted screening in systematic reviews
Discuss this preprint
Start a discussionListed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Systematic reviews are critical for evidence-based decision-making but require significant manual effort during the screening stage, which is labor- intensive and prone to error. Active learning (AL)-assisted screening tools have emerged to address these challenges. However, guidance for using AL-assisted screen- ing in systematic reviews - especially those employing broad search strategies with heterogeneous results - is limited. This study aims to assess the effectiveness and reliability of AL-assisted screening for large, heterogeneous datasets. Specifically, it evaluates the comprehensiveness and necessity of the recommended SAFE proce- dure, examines the influence of different labeling strategies, and investigates whether AL-assisted screening can aid in reducing manual screening errors. Methods: Screening of four large, heterogeneous datasets from medication man- agement systematic reviews was simulated using ASReview. The datasets ranged from 3475 to 16218 records. For these datasets 0.08 to 1% of records were included in the final systematic review. Our simulations systematically varied all parameters defined by the SAFE procedure. Recall versus sampling behavior was analyzed, with a focus on the impact of parameter choices on retrieving records selected for full text inclusions and on reducing the number of records to be screened. Results: AL-assisted screening can effectively reduce the number of records to screen by almost 90% without increasing the risk of missing relevant records in com- parison to manual screening. For three of our datasets, the best performance (100% recall of full text includes and 89-90% reduction in the number of records to screen) is achieved when using the SAFE procedure in combination with the elas-u4 and elas-h3 models and full text labeling. This choice of parameters results in only 87% recall of full text includes for the remaining dataset (16218 records, 0.6% title/ab- stract includes, 0.08% full text includes). For this dataset, the best performance (100% recall, 90% screening reduction) is achieved when using the SAFE procedure with the simpler Naive Bayes model and TF-IDF feature extractor and title/abstract labeling. Conclusions: AL-assisted screening can safely and effectively reduce the workload needed to screen the large, heterogeneous datasets common in medication management systematic reviews. We recommend the modified SAFE procedure using full-text labels and the elas models. If the estimated ratio of full text includes is very low, it may be more appropriate to use the original SAFE procedure with title/abstract labeling.