Prediction of severe COVID-19 infection at the time of testing: A machine learning approach

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Early and effective detection of severe infection cases during a pandemic can significantly help patient prognosis and resource allocation. We develop a machine learning framework for detecting severe COVID-19 cases at the time of RT-PCR testing. We retrospectively studied 988 patients from a small Canadian province that tested positive for SARS-CoV-2 where 42 (4%) cases were at-risk (i.e., resulted in hospitalization, admission to ICU, or death), and 8 (< 1%) cases resulted in death. The limited information available at the time of RT-PCR testing included age, comorbidities, and patients’ reported symptoms, totaling 27 features. Vaccination status was unavailable. Due to the severe class imbalance and small dataset size, we formulated the problem of detecting severe COVID as anomaly detection and applied three models: one-class support vector machine (OCSVM), weight-adjusted XGBoost, and weight-adjusted Ad-aBoost. The OCSVM was the best performing model for detecting the deceased cases with an average 95% true positive rate (TPR) and 27.2% false positive rate (FPR). Meanwhile, the XGBoost provided the best performance for detecting the at-risk cases with an average 96.2% TPR and 19% FPR. In addition, we developed a novel extension to SHAP interpretability to explain the outputs from the models. In agreement with conventional knowledge, we found that comorbidities were influential in predicting severity, however, we also found that symptoms were generally more influential, noting that machine learning combines all available data and is not a single-variate statistical analysis.

Article activity feed

  1. SciScore for 10.1101/2021.10.15.21264970: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.