Machine Learning Reduced Workload for COVID-19 Literature
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objectives: To develop, calibrate, and evaluate a machine learning (ML) classifier designed to reduce citation screening workload for COVID-19 research.Methods: Citations from the WHO COVID-19 Research Database were used to train a logistic regression classifier, which was then calibrated using another independent dataset. A lower threshold was set, below which, records could be excluded from manual screening, and an upper threshold, above which, records could be automatically included into the database. The classifier was then validated on eight COVID-19 reviews to assess workload reduction and recall.Findings: The WHO-Cochrane-EPPI classifier was calibrated for 99% recall, with an upper threshold of 70% and a lower threshold of 39%. During validation, it achieved 98.6% recall across eight reviews. A smaller set of three reviews estimated a 12.5% workload reduction, with 1,390 of 11,153 records below the lower threshold. For database use, workload reduction was estimated at 80%, with 8,955 out of 11,153 records being automatically screened.Conclusion: The WHO-Cochrane-UCL classifier significantly reduces manual screening workload with minimal risk of missing studies. It serves as a model for integrating machine learning into curated resources like the WHO COVID-19 Research Database, improving sustainability by reducing manual screening efforts. The classifier can also be adapted for systematic reviews on COVID-19 and other topics.