Development of a data-driven COVID-19 prognostication tool to inform triage and step-down care for hospitalised patients in Hong Kong: a population-based cohort study

Eva L. H. Tsui
Carrie S. M. Lui
Pauline P. S. Woo
Alan T. L. Cheung
Peggo K. W. Lam
Van T. W. Tang
C. F. Yiu
C. H. Wan
Libby H. Y. Lee

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Background

This is the first study on prognostication in an entire cohort of laboratory-confirmed COVID-19 patients in the city of Hong Kong. Prognostic tool is essential in the contingency response for the next wave of outbreak. This study aims to develop prognostic models to predict COVID-19 patients’ clinical outcome on day 1 and day 5 of hospital admission.

Methods

We did a retrospective analysis of a complete cohort of 1037 COVID-19 laboratory-confirmed patients in Hong Kong as of 30 April 2020, who were admitted to 16 public hospitals with their data sourced from an integrated electronic health records system. It covered demographic information, chronic disease(s) history, presenting symptoms as well as the worst clinical condition status, biomarkers’ readings and Ct value of PCR tests on Day-1 and Day-5 of admission. The study subjects were randomly split into training and testing datasets in a 8:2 ratio. Extreme Gradient Boosting (XGBoost) model was used to classify the training data into three disease severity groups on Day-1 and Day-5.

Results

The 1037 patients had a mean age of 37.8 (SD ± 17.8), 53.8% of them were male. They were grouped under three disease outcome: 4.8% critical/serious, 46.8% stable and 48.4% satisfactory. Under the full models, 30 indicators on Day-1 and Day-5 were used to predict the patients’ disease outcome and achieved an accuracy rate of 92.3% and 99.5%. With a trade-off between practical application and predictive accuracy, the full models were reduced into simpler models with seven common specific predictors, including the worst clinical condition status (4-level), age group, and five biomarkers, namely, CRP, LDH, platelet, neutrophil/lymphocyte ratio and albumin/globulin ratio. Day-1 model’s accuracy rate, macro-/micro-averaged sensitivity and specificity were 91.3%, 84.9%/91.3% and 96.0%/95.7% respectively, as compared to 94.2%, 95.9%/94.2% and 97.8%/97.1% under Day-5 model.

Conclusions

Both Day-1 and Day-5 models can accurately predict the disease severity. Relevant clinical management could be planned according to the predicted patients’ outcome. The model is transformed into a simple online calculator to provide convenient clinical reference tools at the point of care, with an aim to inform clinical decision on triage and step-down care.

Version published to 10.1186/s12911-020-01338-0
Dec 1, 2020

SciScore for 10.1101/2020.07.13.20152348: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	For development as well as evaluation of the model, the entire 1,037 study subjects were, proportional to outcome distribution, randomly split into a training dataset comprising of 829 subjects and a testing dataset of the remaining 208 subjects.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Partial dependency plots were output to depict the marginal effect of each model feature on the predicted outcome. (Appendix Figure 1) The XGBoost models were carried out by using Python’s XGboost version 1.10 whereas other statistical analyses by …

SciScore for 10.1101/2020.07.13.20152348: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	For development as well as evaluation of the model, the entire 1,037 study subjects were, proportional to outcome distribution, randomly split into a training dataset comprising of 829 subjects and a testing dataset of the remaining 208 subjects.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Partial dependency plots were output to depict the marginal effect of each model feature on the predicted outcome. (Appendix Figure 1) The XGBoost models were carried out by using Python’s XGboost version 1.10 whereas other statistical analyses by SAS version 9.4 software.	Python’s suggested: (PyMVPA, RRID:SCR_006099)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Limitation of the study includes the lack of inclusion of data from radiological imaging. Patients with COVID-19 are found to have lung infection with ground glass and consolidative opacities with peripheral and lower lung distribution and bilateral involvement [29]. We are going to include all chest X-ray images up to day 5 in the next study through AI approach of image analytics.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Version published to 10.1101/2020.07.13.20152348 on medRxiv
Jul 14, 2020

A Preliminary Prognostic Model for Predicting Poor Prognosis in COVID-19 Integrating Lung Epithelial Injury (KL-6) with Routine Care Markers

This article has 7 authors:
1. Yunlai Liang
2. Kun Wang
3. Lu Long
4. Qizhuo Hou
5. Wenze Yu
6. Kangkang Huang
7. Bin Yi
This article has no evaluationsLatest version Feb 3, 2026
Rule-Based Electronic Sepsis Alerts Identify High-Risk Patients Despite Poor Diagnostic Accuracy: A Real-World Evaluation and Implications for Machine Learning

This article has 5 authors:
1. Eanna L Lowney
2. Steven G Hirth
3. Laura Fanning BPharm
4. Graeme J Duke
5. Owen Roodenburg
This article has no evaluationsLatest version Jan 13, 2026
Early Risk Stratification in Hospitalized Community-Acquired UTI: An 8-Item Bedside Score for Bacteremia and 30-Day Mortality

This article has 2 authors:
1. Cihan Semet
2. Yusuf Görgülü
This article has no evaluationsLatest version Jan 1, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

A Preliminary Prognostic Model for Predicting Poor Prognosis in COVID-19 Integrating Lung Epithelial Injury (KL-6) with Routine Care Markers

Rule-Based Electronic Sepsis Alerts Identify High-Risk Patients Despite Poor Diagnostic Accuracy: A Real-World Evaluation and Implications for Machine Learning

Early Risk Stratification in Hospitalized Community-Acquired UTI: An 8-Item Bedside Score for Bacteremia and 30-Day Mortality