Application of Machine Learning in Prediction of COVID-19 Diagnosis for Indonesian Healthcare Workers

Abstract

The COVID-19 pandemic poses a heightened risk to health workers, especially in low- and middle-income countries such as Indonesia. Due to the limitations of implementing mass RT-PCR testing for health workers, high-performing and cost-effective methodologies must be developed to help identify COVID-19 positive health workers and protect the spearhead of the battle against the pandemic. This study aimed to investigate the application of machine learning classifiers to predict the risk of COVID-19 positivity (by RT-PCR) using data obtained from a survey specific to health workers. Machine learning tools can enhance COVID-19 screening capacity in high-risk populations such as health workers in environments where cost is a barrier to the accessibility of adequate testing and screening supplies. We built two sets of COVID-19 Likelihood Meter (CLM) models: one trained on data from a broad population of health workers in Jakarta and Semarang (full model) and tested on the same, and one trained on health workers from Jakarta only (Jakarta model) and tested on both the same and an independent population of Semarang health workers. The area under the receiver-operating-characteristic curve (AUC), average precision (AP), and the Brier score (BS) were used to assess model performance. Shapely additive explanations (SHAP) were used to analyse future importance. The final dataset for the study included 5,393 healthcare workers. For the full model, the random forest was selected as the algorithm choice. It achieved cross-validation of mean AUC of 0.832 ± 0.015, AP of 0.513 ± 0.039, and BS of 0.124 ± 0.005, and was high performing during testing with AUC and AP of 0.849 and 0.51, respectively. The random forest classifier also displayed the best and most robust performance for the Jakarta model, with AUC of 0.856 ± 0.015, AP of 0.434 ± 0.039, and BS of 0.08 ± 0.0003. The performance when testing on the Semarang healthcare workers was AUC of 0.745 and AP of 0.694. Meanwhile, the performance for Jakarta 2022 test set was an AUC of 0.761 and AP of 0.535. Our models yielded high predictive performance and can be used as an alternative COVID-19 methodology for healthcare workers in Indonesia, therefore helping in predicting an increased trend of transmission during the transition into endemic.

SciScore for 10.1101/2021.10.15.21265021: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
These were implemented using the scikit-learn Python library [33], while XGBoost [34] was implemented using scikit-learn compatible packages in Python.	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
Feature importance and model interpretability was assessed using Shapley additive explanations (SHAP) from the SHAP package [36] in Python.	Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We …

SciScore for 10.1101/2021.10.15.21265021: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
These were implemented using the scikit-learn Python library [33], while XGBoost [34] was implemented using scikit-learn compatible packages in Python.	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
Feature importance and model interpretability was assessed using Shapley additive explanations (SHAP) from the SHAP package [36] in Python.	Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

This study has several limitations. The first limitation is the self-reported nature of the survey, which poses risks of over or underreporting. Methods of fraud detection and error handling must be applied if using the model in real-time. The authority for hospitals to recommend their staff to be included in the study may introduce selection bias in the data collection process. Additionally, recall bias may be introduced in answers to retrospective questions in the survey, such as symptoms within the previous 14 days. Future work for the study includes collecting more data for these models, as well as investigating models for using CLM survey and other data to predict additional outcomes, such as hospitalization and mortality. Recruitment of health workers for the study also will expand to several other provinces within Indonesia. Notwithstanding the limitations, our results demonstrate predictive capability for COVID-19 in health workers using machine learning. Our preliminary models showed high predictive performance, especially when trained and tested on similar population groups. When used in practice, CLM can be tuned by training on local populations that resemble target populations in which it will be used. The models can potentially be used to prioritize RT-PCR testing in regions where diagnostic resources are scarce. Allocating testing using the model predictions may lead to reductions in the challenges health workers in Indonesia are facing due to the pandemic. Our ...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Application of Machine Learning in Prediction of COVID-19 Diagnosis for Indonesian Healthcare Workers

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed