An ensemble prediction model for COVID-19 mortality risk

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

It’s critical to identify COVID-19 patients with a higher death risk at early stage to give them better hospitalization or intensive care. However, thus far, none of the machine learning models has been shown to be successful in an independent cohort. We aim to develop a machine learning model which could accurately predict death risk of COVID-19 patients at an early stage in other independent cohorts.

Methods

We used a cohort containing 4711 patients whose clinical features associated with patient physiological conditions or lab test data associated with inflammation, hepatorenal function, cardiovascular function and so on to identify key features. To do so, we first developed a novel data preprocessing approach to clean up clinical features and then developed an ensemble machine learning method to identify key features.

Results

Finally, we identified 14 key clinical features whose combination reached a good predictive performance of AUC 0.907. Most importantly, we successfully validated these key features in a large independent cohort containing 15,790 patients.

Conclusions

Our study shows that 14 key features are robust and useful in predicting the risk of death in patients confirmed SARS-CoV-2 infection at an early stage, and potentially useful in clinical settings to help in making clinical decisions.

Article activity feed

  1. SciScore for 10.1101/2022.01.10.22268985: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    There are also some limitations in this study. First of all, for cohort 1 (the training set), the patient population we studied was mainly hospitalized patients, and they generally exhibited more severe symptoms and therefore had a higher mortality rate than the general population, which may have caused some bias in our predictive model in the general population. Second, the characteristics of the cohort may vary performance of models and its ability to be validated. For example, the model’s performance was slightly lower in cohort 2 than cohort 1, because the structure of the two cohorts, such as age distribution, sex ratio, mortality rate, etc. is different. In addition, although we adopt functionally similar features, the differences between these features may also be responsible for the difference in model performance between cohorts. Moreover, since most of the clinical features adopted in this study were missing to varying degrees, the imputed data were affected by other data, which may affect the accuracy of the predictive model. Finally, COVID-19 pandemics are often accompanied by surges in patient numbers, resulting in difficulties in collecting all the required clinical features data, which will limit the application of our predictive model.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.