Development and evaluation of a machine learning-based in-hospital COVID-19 disease outcome predictor (CODOP): A multicontinental retrospective study

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This article is dealing with the unmet need to generate a machine-learning approach for the early and accurate estimation of the risk among COVID-19 admission. The presented data generate confidence on the validity since they have been developed in a vast number of patients and they are validated in cohorts from different geographical regions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

New SARS-CoV-2 variants, breakthrough infections, waning immunity, and sub-optimal vaccination rates account for surges of hospitalizations and deaths. There is an urgent need for clinically valuable and generalizable triage tools assisting the allocation of hospital resources, particularly in resource-limited countries. We developed and validate CODOP, a machine learning-based tool for predicting the clinical outcome of hospitalized COVID-19 patients. CODOP was trained, tested and validated with six cohorts encompassing 29223 COVID-19 patients from more than 150 hospitals in Spain, the USA and Latin America during 2020–22. CODOP uses 12 clinical parameters commonly measured at hospital admission for reaching high discriminative ability up to 9 days before clinical resolution (AUROC: 0·90–0·96), it is well calibrated, and it enables an effective dynamic risk stratification during hospitalization. Furthermore, CODOP maintains its predictive ability independently of the virus variant and the vaccination status. To reckon with the fluctuating pressure levels in hospitals during the pandemic, we offer two online CODOP calculators, suited for undertriage or overtriage scenarios, validated with a cohort of patients from 42 hospitals in three Latin American countries (78–100% sensitivity and 89–97% specificity). The performance of CODOP in heterogeneous and geographically disperse patient cohorts and the easiness of use strongly suggest its clinical utility, particularly in resource-limited countries.

Article activity feed

  1. Evaluation Summary:

    This article is dealing with the unmet need to generate a machine-learning approach for the early and accurate estimation of the risk among COVID-19 admission. The presented data generate confidence on the validity since they have been developed in a vast number of patients and they are validated in cohorts from different geographical regions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    This submission is dealing with the unmet need to generate a machine learning approach for the early and accurate estimation of the risk among COVID-19 submission. The presented data generate confidence on the validity since they have been developed in a vast number of patients and they are validated in cohorts from different geographical regions.

  3. Reviewer #2 (Public Review):

    The authors describe the development by machine learning of a score, namely CODOP, to predict in an easy and cheap way in-hospital mortality of patients with COVId-19 pneumonia. The score is developed and validated through large and different (multinational) cohorts suggesting robust results. They provide two versions in case of over- and under-triage.
    The manuscript is well written and statistics are adequate. All related data are provided and ethical issues do not rise.

  4. Reviewer #3 (Public Review):

    This is a robust, solid work developing an artificial intelligence-derived model (CODOP) which accurately predicts mortality risk in COVID-19 patients needing of hospitalization. Major strengths include the derivation and validation approach using thousands of patients across different continents, either in a single time point (hospital admission) or across a time period (first nine days following admission). The low number of missing values for the considered variables also contributes to the validity of the results. The eleven parameters considered are commonly used in hospitals all over the world, facilitating its application. They compare the performance of CODOP against three reference models. The authors have also developed an on-line calculator to make easier the clinical application of this model.

  5. SciScore for 10.1101/2021.09.20.21263794: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    EthicsIRB: 9 The use of the anonymized clinical data of patients from the SEMI-COVID-Registry was approved by the Provincial Research Ethics Committee of Málaga (Spain).
    Sex as a biological variablenot detected.
    Randomizationnot detected.
    BlindingAll predictions were done blinded to the final clinical outcome.
    Power Analysisnot detected.

    Table 2: Resources

    Recombinant DNA
    SentencesResources
    The metrics were calculated using R packages pROC21 (version 1.17.0.1) and caret15 (R package version 6.0-86).
    pROC21
    suggested: None
    Software and Algorithms
    SentencesResources
    COPE model is a linear regression model, which uses variables age, respiratory rate, C-reactive protein, lactic dehydrogenase, albumin, and urea.
    COPE
    suggested: (COPE, RRID:SCR_009153)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The overall performance of CODOP has inherent limitations, some of them generalizable to any MLH. On the one side, our approach to using training and test datasets with a high degree of perturbations (see above) adds several sources of variability32: pre-analytical due to differences in blood sampling, analytical due to different laboratory protocols, intra- and inter-individual, and inter-hospital and geographical differences in clinical practices. As an additional factor, the high diversity of COVID-19 encompassing more than 60 disease subtypes7 sets a limitation in terms of the discriminability ability and the overall clinical utility of any MHL. In contrast to other predictors and to facilitate its use, CODOP does not take into account the level of care received by each patient (e.g., ICU versus basic care), which influences the outcome of the patient and perturbs the discrimination ability of CODOP (as predictions are made with the data from blood analyses at hospital admission). A clear example is a slightly lower performance of CODOP-Ovt (sensitivity of 73%) in the case of the “Hospital Vélez Sarsfield‥ from Buenos Aires (named as Argentina (b) in Figure 4B), where all patients analysed by CODOP were finally treated in the ICU. On the other hand, CODOP-Unt would have correctly suggested triaging 84% of these patients already on the day of admission, therefore offering a significant clinical utility. Finally, the clinical utility of MHL has to take into account the chan...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.