M-PreSS: A Model Pre-training Approach for Study Screening in Systematic Reviews
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Conducting a systematic review is labour-intensive and time-consuming, especially during the study screening process. Previous research has introduced traditional machine learning models (e.g. Support Vector Machines) to automate the study screening process, but it is difficult to generalise across topics. Therefore, recent research has explored the use of existing large language models (LLMs), such as ChatGPT/GPT-4, for study screening. However, the lack of transparency in training data and consistency in output results make applying such large, commercial LLMs challenging in the context of systematic reviews where transparency in methods is particularly important.
Results
We introduce an approach to fine-tune an open-source biomedical language model (Blue-BERT) using a Siamese neural network 1 so that it screens the scientific literature databases on multiple research topics. We evaluate different training approaches in seven COVID-19 systematic reviews. The results indicate good generalisation among topics with an average recall/sensitivity of 0.86 (minimum: 0.67, maximum: 1.00) and an average false positive rate of 6.48% (minimum: 1.38%, maximum: 11.41%). Furthermore, adding study selection criteria to the topic definition can improve the model performance (Area Under the Precision-Recall Curve [PRAUC]) by 2.74%, and adding more related review topics during training can increase the performance by 15.82%.
Conclusions
Our results indicate that fine-tuning BlueBERT with study screening datasets can outperform ChatGPT/GPT-4 in two out of three COVID-19 review topics reported in the literature, whilst maintaining the ability for researchers to continue updating or extending the search for related evidence and significantly reducing the computational resource requirements.