Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19

This article has been Reviewed by the following groups

Read the full article

Abstract

As predicting the trajectory of COVID-19 is challenging, machine learning models could assist physicians in identifying high-risk individuals. This study compares the performance of 18 machine learning algorithms for predicting ICU admission and mortality among COVID-19 patients. Using COVID-19 patient data from the Mass General Brigham (MGB) Healthcare database, we developed and internally validated models using patients presenting to the Emergency Department (ED) between March-April 2020 ( n  = 3597) and further validated them using temporally distinct individuals who presented to the ED between May-August 2020 ( n  = 1711). We show that ensemble-based models perform better than other model types at predicting both 5-day ICU admission and 28-day mortality from COVID-19. CRP, LDH, and O 2 saturation were important for ICU admission models whereas eGFR <60 ml/min/1.73 m 2 , and neutrophil and lymphocyte percentages were the most important variables for predicting mortality. Implementing such models could help in clinical decision-making for future infectious disease outbreaks including COVID-19.

Article activity feed

  1. SciScore for 10.1101/2020.11.20.20235598: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The programming code for R and Python are available upon request addressed to the corresponding author (jain(at)steele.mgh.harvard.edu).
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    There are a number of limitations in our study. The lack of complete laboratory values for all patients necessitated exclusion of a large number of patients and removal of some variables in development of the models. As suggested by Jakobsen et al24, imputation is not an advisable method to handle missingness, when the percentage of missing data exceeds 40%. The majority of individuals (>98%) included in our analysis were those patients who visited to ED and subsequently became in-patients. In the patients excluded due to missingness, only ∼40% of the patients needed in-patient care. This discrepancy in severity might be the reason for lack of laboratory values in excluded patients. Another limitation is that, as some of the laboratory values may take hours to be reported, the data may not be available until after the patient has transitioned out of the ER, decreasing the utility of using these predictors in triaging patient disposition. Similarly, as the mortality model uses ventilator use as a predictor, it requires ICU admission to be utilized and would not be valid in an earlier phase of care. We also observed that the predicting capability on the external cohort (imbalanced dataset) was higher for ICU admission models in comparison to mortality models. This could be due to the changes instated in the ICU during the later period of pandemic. The changes in the treatment regimens might be affecting the mortality and thereby affecting the predictive power of our models. Our...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.