Development of a data-driven COVID-19 prognostication tool to inform triage and step-down care for hospitalised patients in Hong Kong: a population-based cohort study

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

This is the first study on prognostication in an entire cohort of laboratory-confirmed COVID-19 patients in the city of Hong Kong. Prognostic tool is essential in the contingency response for the next wave of outbreak. This study aims to develop prognostic models to predict COVID-19 patients’ clinical outcome on day 1 and day 5 of hospital admission.

Methods

We did a retrospective analysis of a complete cohort of 1037 COVID-19 laboratory-confirmed patients in Hong Kong as of 30 April 2020, who were admitted to 16 public hospitals with their data sourced from an integrated electronic health records system. It covered demographic information, chronic disease(s) history, presenting symptoms as well as the worst clinical condition status, biomarkers’ readings and Ct value of PCR tests on Day-1 and Day-5 of admission. The study subjects were randomly split into training and testing datasets in a 8:2 ratio. Extreme Gradient Boosting (XGBoost) model was used to classify the training data into three disease severity groups on Day-1 and Day-5.

Results

The 1037 patients had a mean age of 37.8 (SD ± 17.8), 53.8% of them were male. They were grouped under three disease outcome: 4.8% critical/serious, 46.8% stable and 48.4% satisfactory. Under the full models, 30 indicators on Day-1 and Day-5 were used to predict the patients’ disease outcome and achieved an accuracy rate of 92.3% and 99.5%. With a trade-off between practical application and predictive accuracy, the full models were reduced into simpler models with seven common specific predictors, including the worst clinical condition status (4-level), age group, and five biomarkers, namely, CRP, LDH, platelet, neutrophil/lymphocyte ratio and albumin/globulin ratio. Day-1 model’s accuracy rate, macro-/micro-averaged sensitivity and specificity were 91.3%, 84.9%/91.3% and 96.0%/95.7% respectively, as compared to 94.2%, 95.9%/94.2% and 97.8%/97.1% under Day-5 model.

Conclusions

Both Day-1 and Day-5 models can accurately predict the disease severity. Relevant clinical management could be planned according to the predicted patients’ outcome. The model is transformed into a simple online calculator to provide convenient clinical reference tools at the point of care, with an aim to inform clinical decision on triage and step-down care.

Article activity feed

  1. SciScore for 10.1101/2020.07.13.20152348: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    RandomizationFor development as well as evaluation of the model, the entire 1,037 study subjects were, proportional to outcome distribution, randomly split into a training dataset comprising of 829 subjects and a testing dataset of the remaining 208 subjects.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Partial dependency plots were output to depict the marginal effect of each model feature on the predicted outcome. (Appendix Figure 1) The XGBoost models were carried out by using Python’s XGboost version 1.10 whereas other statistical analyses by SAS version 9.4 software.
    Python’s
    suggested: (PyMVPA, RRID:SCR_006099)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Limitation of the study includes the lack of inclusion of data from radiological imaging. Patients with COVID-19 are found to have lung infection with ground glass and consolidative opacities with peripheral and lower lung distribution and bilateral involvement [29]. We are going to include all chest X-ray images up to day 5 in the next study through AI approach of image analytics.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.