Development and validation of a machine learning model predicting illness trajectory and hospital utilization of COVID-19 patients: A nationwide study

This article has been Reviewed by the following groups

Read the full article

Abstract

Objective

The spread of coronavirus disease 2019 (COVID-19) has led to severe strain on hospital capacity in many countries. We aim to develop a model helping planners assess expected COVID-19 hospital resource utilization based on individual patient characteristics.

Materials and Methods

We develop a model of patient clinical course based on an advanced multistate survival model. The model predicts the patient's disease course in terms of clinical states—critical, severe, or moderate. The model also predicts hospital utilization on the level of entire hospitals or healthcare systems. We cross-validated the model using a nationwide registry following the day-by-day clinical status of all hospitalized COVID-19 patients in Israel from March 1 to May 2, 2020 (n = 2703).

Results

Per-day mean absolute errors for predicted total and critical care hospital bed utilization were 4.72 ± 1.07 and 1.68 ± 0.40, respectively, over cohorts of 330 hospitalized patients; areas under the curve for prediction of critical illness and in-hospital mortality were 0.88 ± 0.04 and 0.96 ± 0.04, respectively. We further present the impact of patient influx scenarios on day-by-day healthcare system utilization. We provide an accompanying R software package.

Discussion

The proposed model accurately predicts total and critical care hospital utilization. The model enables evaluating impacts of patient influx scenarios on utilization, accounting for the state of currently hospitalized patients and characteristics of incoming patients. We show that accurate hospital load predictions were possible using only a patient’s age, sex, and day-by-day clinical state (critical, severe, or moderate).

Conclusions

The multistate model we develop is a powerful tool for predicting individual-level patient outcomes and hospital-level utilization.

Article activity feed

  1. SciScore for 10.1101/2020.09.04.20185645: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our model has several limitations. First, it is based on data from the first wave of patients in Israel. As treatment strategies and hospitalization policies differ over time and between health systems and hospitals, we cannot guarantee that LOS statistics will be the same across all locales and times. Thus, when possible we encourage planners to use the attached software package and fit it to their own hospitalization data. We will update the software package and app as more updated data will become available from the Israeli registry. A second limitation is that our model relies on estimation of the frequency and characteristics of future incoming patients. If arriving patient populations – both patient type and patient numbers – will differ significantly from the scenarios taken into account, the model’s predictions will be wrong. We thus recommend that planners evaluate multiple hypotheticals for incoming patients, testing for scenarios such as the ones we presented in the Results section above. A third limitation is that the model does not take into account patients’ comorbidities21–23 On the one hand, our model achieves good results while analyzing only a limited number of covariates as input; on the other hand, it is possible that using comorbidities could enhance the model’s performance. We also wish to point out that researchers with access to patient-level comorbidity data can easily incorporate it into a multistate model using the software we provide. A fourth limi...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.