Prognostic model to identify and quantify risk factors for mortality among hospitalised patients with COVID-19 in the USA
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
To develop a prognostic model to identify and quantify risk factors for mortality among patients admitted to the hospital with COVID-19.
Design
Retrospective cohort study. Patients were randomly assigned to either training (80%) or test (20%) sets. The training set was used to fit a multivariable logistic regression. Predictors were ranked using variable importance metrics. Models were assessed by C-indices, Brier scores and calibration plots in the test set.
Setting
Optum de-identified COVID-19 Electronic Health Record dataset including over 700 hospitals and 7000 clinics in the USA.
Participants
17 086 patients hospitalised with COVID-19 between 20 February 2020 and 5 June 2020.
Main outcome measure
All-cause mortality while hospitalised.
Results
The full model that included information on demographics, comorbidities, laboratory results, and vital signs had good discrimination (C-index=0.87) and was well calibrated, with some overpredictions for the most at-risk patients. Results were similar on the training and test sets, suggesting that there was little overfitting. Age was the most important risk factor. The performance of models that included all demographics and comorbidities (C-index=0.79) was only slightly better than a model that only included age (C-index=0.76). Across the study period, predicted mortality was 1.3% for patients aged 18 years old, 8.9% for 55 years old and 28.7% for 85 years old. Predicted mortality across all ages declined over the study period from 22.4% by March to 14.0% by May.
Conclusion
Age was the most important predictor of all-cause mortality, although vital signs and laboratory results added considerable prognostic information, with oxygen saturation, temperature, respiratory rate, lactate dehydrogenase and white cell count being among the most important predictors. Demographic and comorbidity factors did not improve model performance appreciably. The full model had good discrimination and was reasonably well calibrated, suggesting that it may be useful for assessment of prognosis.
Article activity feed
-
-
SciScore for 10.1101/2020.09.22.20196204: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization To validate the model, we randomly split the data into a training and test set using an 80/20 split and evaluated the model in both the training and the test sets. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Study Limitations: This study is not without limitations. First, there was considerable missing data, especially for laboratory results. We attempted to overcome this limitation …
SciScore for 10.1101/2020.09.22.20196204: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization To validate the model, we randomly split the data into a training and test set using an 80/20 split and evaluated the model in both the training and the test sets. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Study Limitations: This study is not without limitations. First, there was considerable missing data, especially for laboratory results. We attempted to overcome this limitation using multiple imputation, although the coefficient estimates are only guaranteed to be unbiased if the data are missing at random and the missing mechanism is known. While this is an untestable assumption, our diagnostics were not suggestive of problems in the imputation as the distribution of the observed and imputed data were very similar. Second, many of the laboratory results contained outliers. Although we truncated these variables to improve fit, predictions for new patients with extreme laboratory values lying outside of the chosen bounds are inherently uncertain. The presence of outliers could also imply that some laboratory values have been miscoded. This miscoding is a form of measurement error that would attenuate the relationship between mortality and the laboratory values [56,57]. Third, we did not have data on the day of death or out-of-hospital mortality. The latter could mean that mortality is underestimated if patients are discharged from the hospital and later die at home from COVID-19. Evidence suggests that COVID-19 deaths in the hospital comprise 38% of all deaths, but since the proportion of those 38% who were previously hospitalized is unknown, it is difficult to calibrate the extent of this potential bias [12]. Without day of death data, we were unable to perform time to event...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-