Risk prediction for poor outcome and death in hospital in-patients with COVID-19: derivation in Wuhan, China and external validation in London, UK

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Accurate risk prediction of clinical outcome would usefully inform clinical decisions and intervention targeting in COVID-19. The aim of this study was to derive and validate risk prediction models for poor outcome and death in adult inpatients with COVID-19.

Methods

Model derivation using data from Wuhan, China used logistic regression with death and poor outcome (death or severe disease) as outcomes. Predictors were demographic, comorbidity, symptom and laboratory test variables. The best performing models were externally validated in data from London, UK.

Findings

4.3% of the derivation cohort (n=775) died and 9.7% had a poor outcome, compared to 34.1% and 42.9% of the validation cohort (n=226). In derivation, prediction models based on age, sex, neutrophil count, lymphocyte count, platelet count, C-reactive protein and creatinine had excellent discrimination (death c-index=0.91, poor outcome c-index=0.88), with good-to-excellent calibration. Using two cut-offs to define low, high and very-high risk groups, derivation patients were stratified in groups with observed death rates of 0.34%, 15.0% and 28.3% and poor outcome rates 0.63%, 8.9% and 58.5%. External validation discrimination was good (c-index death=0.74, poor outcome=0.72) as was calibration. However, observed rates of death were 16.5%, 42.9% and 58.4% and poor outcome 26.3%, 28.4% and 64.8% in predicted low, high and very-high risk groups.

Interpretation

Our prediction model using demography and routinely-available laboratory tests performed very well in internal validation in the lower-risk derivation population, but less well in the much higher-risk external validation population. Further external validation is needed. Collaboration to create larger derivation datasets, and to rapidly externally validate all proposed prediction models in a range of populations is needed, before routine implementation of any risk prediction tool in clinical care.

Funding

MRC, Wellcome Trust, HDR-UK, LifeArc, participating hospitals, NNSFC, National Key R&D Program, Pudong Health and Family Planning Commission

Research in context

Evidence before this study

Several prognostic models for predicting mortality risk, progression to severe disease, or length of hospital stay in COVID-19 have been published. 1 Commonly reported predictors of severe prognosis in patients with COVID-19 include age, sex, computed tomography scan features, C-reactive protein (CRP), lactic dehydrogenase, and lymphocyte count. Symptoms (notably dyspnoea) and comorbidities (e.g. chronic lung disease, cardiovascular disease and hypertension) are also reported to have associations with poor prognosis. 2 However, most studies have not described the study population or intended use of prediction models, and external validation is rare and to date done using datasets originating from different Wuhan hospitals. 3 Given different patterns of testing and organisation of healthcare pathways, external validation in datasets from other countries is required.

Added value of this study

This study used data from Wuhan, China to derive and internally validate multivariable models to predict poor outcome and death in COVID-19 patients after hospital admission, with external validation using data from King’s College Hospital, London, UK. Mortality and poor outcome occurred in 4.3% and 9.7% of patients in Wuhan, compared to 34.1% and 42.9% of patients in London. Models based on age, sex and simple routinely available laboratory tests (lymphocyte count, neutrophil count, platelet count, CRP and creatinine) had good discrimination and calibration in internal validation, but performed only moderately well in external validation. Models based on age, sex, symptoms and comorbidity were adequate in internal validation for poor outcome (ICU admission or death) but had poor performance for death alone.

Implications of all the available evidence

This study and others find that relatively simple risk prediction models using demographic, clinical and laboratory data perform well in internal validation but at best moderately in external validation, either because derivation and external validation populations are small (Xie et al 3 ) and/or because they vary greatly in casemix and severity (our study). There are three decision points where risk prediction may be most useful: (1) deciding who to test; (2) deciding which patients in the community are at high-risk of poor outcomes; and (3) identifying patients at high-risk at the point of hospital admission. Larger studies focusing on particular decision points, with rapid external validation in multiple datasets are needed. A key gap is risk prediction tools for use in community triage (decisions to admit, or to keep at home with varying intensities of follow-up including telemonitoring) or in low income settings where laboratory tests may not be routinely available at the point of decision-making. This requires systematic data collection in community and low-income settings to derive and evaluate appropriate models.

Article activity feed

  1. SciScore for 10.1101/2020.04.28.20082222: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The study also has a number of limitations. First, the clinical datasets were collected when healthcare services were under severe strain. Data extraction sought to ensure consistency and accuracy, but was not blind to outcome, and there is missing data in both datasets. Second, the datasets used are smaller than ideal (although as large as or larger than previous studies), and there are relatively few deaths in particular. Our analytical approach aimed to minimise overfitting, but further research using larger, federated datasets is clearly required. Third, clinical assessments at admission such as SpO2 are likely to be important predictors of short-term outcome,3 but were not available in either dataset. Fourth, our external validation dataset has very different case-mix where spectrum effects are likely to contribute to lower prediction model performance, and only has follow-up to a fixed date (period range: [6-39] days, although this is a reasonable time-horizon to inform clinical decision-making at hospital admission). Finally, all data available is for people with PCR-diagnosed COVID-19 who are admitted to hospital (decision 3 in Figure 1). Although the Wuhan cohort includes many people with less severe disease, in the validation cohort most admitted patients are likely to have severe disease. The findings therefore cannot be assumed to be applicable to decisions made earlier in the course of disease (decision 2 in Figure 1). Our univariate findings are similar to other...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.