Regional performance variation in external validation of four prediction models for severity of COVID-19 at hospital admission: An observational multi-centre cohort study
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Prediction models should be externally validated to assess their performance before implementation. Several prediction models for coronavirus disease-19 (COVID-19) have been published. This observational cohort study aimed to validate published models of severity for hospitalized patients with COVID-19 using clinical and laboratory predictors.
Methods
Prediction models fitting relevant inclusion criteria were chosen for validation. The outcome was either mortality or a composite outcome of mortality and ICU admission (severe disease). 1295 patients admitted with symptoms of COVID-19 at Kings Cross Hospital (KCH) in London, United Kingdom, and 307 patients at Oslo University Hospital (OUH) in Oslo, Norway were included. The performance of the models was assessed in terms of discrimination and calibration.
Results
We identified two models for prediction of mortality (referred to as Xie and Zhang1) and two models for prediction of severe disease (Allenbach and Zhang2). The performance of the models was variable. For prediction of mortality Xie had good discrimination at OUH with an area under the receiver-operating characteristic (AUROC) 0.87 [95% confidence interval (CI) 0.79–0.95] and acceptable discrimination at KCH, AUROC 0.79 [0.76–0.82]. In prediction of severe disease, Allenbach had acceptable discrimination (OUH AUROC 0.81 [0.74–0.88] and KCH AUROC 0.72 [0.68–0.75]). The Zhang models had moderate to poor discrimination. Initial calibration was poor for all models but improved with recalibration.
Conclusions
The performance of the four prediction models was variable. The Xie model had the best discrimination for mortality, while the Allenbach model had acceptable results for prediction of severe disease.
Article activity feed
-
-
SciScore for 10.1101/2021.03.26.21254390: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: The OUH project protocol was approved by the Regional Ethical Committee of South East Norway (Reference 137045).
Consent: Informed consent was waived because of the strictly observational nature of the project.Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources A structured search was performed in PubMed with the words “COVID-19” and “prediction model” or “machine learning” or “prognosis model”. PubMedsuggested: (PubMed, RRID:SCR_004846)Prediction models included in the review by Wynants et. al. [7] in May 2020 were also investigated, as … SciScore for 10.1101/2021.03.26.21254390: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: The OUH project protocol was approved by the Regional Ethical Committee of South East Norway (Reference 137045).
Consent: Informed consent was waived because of the strictly observational nature of the project.Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources A structured search was performed in PubMed with the words “COVID-19” and “prediction model” or “machine learning” or “prognosis model”. PubMedsuggested: (PubMed, RRID:SCR_004846)Prediction models included in the review by Wynants et. al. [7] in May 2020 were also investigated, as well as search for articles/preprints citing Wynants et. al. using Google Scholar 18.05.2020. Google Scholarsuggested: (Google Scholar, RRID:SCR_008878)Demographics, clinical variables and hospital stay information were manually recorded in the registry and merged with laboratory results exported from the laboratory information system in Microsoft Excel. Microsoft Excelsuggested: (Microsoft Excel, RRID:SCR_016137)Missing values (i.e. no recorded values within 24 hours) were generally imputed using k-nearest neighbors (KNN) although we tested more advanced techniques based on Python’s scikit-learn IterativeImputer, including random forest-based imputation, and multiple imputation using Bayesian ridge and Gaussian process methods [16, 17]. Python’ssuggested: (PyMVPA, RRID:SCR_006099)scikit-learnsuggested: (scikit-learn, RRID:SCR_002577)All statistical analyses were conducted in Python 3.7 and R 3.4 [19]. Pythonsuggested: (IPython, RRID:SCR_001658)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Secondly, there might be weaknesses in the models, as bias is common in prediction models [12]. To date, only the Allenbach study is published in a peer-reviewed journal, while Xie and Zhang are preprints. Thirdly, criteria for ICU admittance might vary across sites. The fact that we and other studies generally find better discrimination for mortality than for severe disease (often defined by ICU admittance) supports this hypothesis. For instance, patients with short life expectancy will often not be admitted to the ICU, but given oxygen therapy in a hospital ward and transferred to nursing homes for palliative care. These patients, not fulfilling the criteria for severe disease, often have predictors that indicate severe disease at admission. Many prediction models have been published, but few have been systematically validated [24]. To our knowledge, only one study to date has validated COVID-19 prediction models; Gupta et. al recently validated 22 prognostic models [6], including the Xie and Xhang models. For the OUH cohort, we found substantially better discrimination for the Xie and Allenbach models for the prediction of mortality and severe disease, respectively. The performance of the models at KCH was more similar to the results in the Gupta study, also performed at a London hospital. The rate of severe disease, mortality and the characteristics of the London cohorts are quite similar which might explain the similar performance at these two sites. Several other predic...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-