Evaluation of machine learning for predicting COVID-19 outcomes from a national electronic medical records database

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Objective

When novel diseases such as COVID-19 emerge, predictors of clinical outcomes might be unknown. Using data from electronic medical records (EMR) allows evaluation of potential predictors without selecting specific features a priori for a model. We evaluated different machine learning models for predicting outcomes among COVID-19 inpatients using raw EMR data.

Materials and Methods

In Premier Healthcare Data Special Release: COVID-19 Edition (PHD-SR COVID-19, release date March, 24 2021), we included patients admitted with COVID-19 during February 2020 through April 2021 and built time-ordered medical histories. Setting the prediction horizon at 24 hours into the first COVID-19 inpatient visit, we aimed to predict intensive care unit (ICU) admission, hyperinflammatory syndrome (HS), and death. We evaluated the following models: L2-penalized logistic regression, random forest, gradient boosting classifier, deep averaging network, and recurrent neural network with a long short-term memory cell.

Results

There were 57,355 COVID-19 patients identified in PHD-SR COVID-19. ICU admission was the easiest outcome to predict (best AUC=79%), and HS was the hardest to predict (best AUC=70%). Models performed similarly within each outcome.

Discussion

Although the models learned to attend to meaningful clinical information, they performed similarly, suggesting performance limitations are inherent to the data.

Conclusion

Predictive models using raw EMR data are promising because they can use many observations and encompass a large feature space; however, traditional and deep learning models may perform similarly when few features are available at the individual patient level.

Article activity feed

  1. SciScore for 10.1101/2022.04.13.22273835: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    We implemented these models in scikit-learn [26] and trained them with their default hyperparameters unless otherwise noted.
    scikit-learn
    suggested: (scikit-learn, RRID:SCR_002577)
    Software: We used R 4.1.0 [28] and SQL for data extraction and preprocessing, and we used Python 3.8 for feature extraction, modeling, and statistical analysis.
    Python
    suggested: (IPython, RRID:SCR_001658)
    Baseline models were built in scikit-learn, and deep models were built using Keras [22] with the TensorFlow 2.0 backend [29].
    TensorFlow
    suggested: (tensorflow, RRID:SCR_016345)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.