Prospective Predictive Performance Comparison between Clinical Gestalt and Validated COVID-19 Mortality Scores
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Most COVID-19 mortality scores were developed at the beginning of the pandemic and clinicians now have more experience and evidence-based interventions. Therefore, we hypothesized that the predictive performance of COVID-19 mortality scores is now lower than originally reported. We aimed to prospectively evaluate the current predictive accuracy of six COVID-19 scores and compared it with the accuracy of clinical gestalt predictions. 200 patients with COVID-19 were enrolled in a tertiary hospital in Mexico City between September and December 2020. The area under the curve (AUC) of the LOW-HARM, qSOFA, MSL-COVID-19, NUTRI-CoV, and NEWS2 scores and the AUC of clinical gestalt predictions of death (as a percentage) were determined. In total, 166 patients (106 men and 60 women aged 56±9 years) with confirmed COVID-19 were included in the analysis. The AUC of all scores was significantly lower than originally reported: LOW-HARM 0.76 (95% CI 0.69 to 0.84) vs 0.96 (95% CI 0.94 to 0.98), qSOFA 0.61 (95% CI 0.53 to 0.69) vs 0.74 (95% CI 0.65 to 0.81), MSL-COVID-19 0.64 (95% CI 0.55 to 0.73) vs 0.72 (95% CI 0.69 to 0.75), NUTRI-CoV 0.60 (95% CI 0.51 to 0.69) vs 0.79 (95% CI 0.76 to 0.82), NEWS2 0.65 (95% CI 0.56 to 0.75) vs 0.84 (95% CI 0.79 to 0.90), and neutrophil to lymphocyte ratio 0.65 (95% CI 0.57 to 0.73) vs 0.74 (95% CI 0.62 to 0.85). Clinical gestalt predictions were non-inferior to mortality scores, with an AUC of 0.68 (95% CI 0.59 to 0.77). Adjusting scores with locally derived likelihood ratios did not improve their performance; however, some scores outperformed clinical gestalt predictions when clinicians’ confidence of prediction was <80%. Despite its subjective nature, clinical gestalt has relevant advantages in predicting COVID-19 clinical outcomes. The need and performance of most COVID-19 mortality scores need to be evaluated regularly.
Article activity feed
-
-
SciScore for 10.1101/2021.04.16.21255647: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: This study was approved by the Ethics Committee for Research on Humans of the National Institute of Medical Sciences and Nutrition Salvador Zubirán on August 25, 2020 (Reg. No. DMC-3369-20-20-1-1a). Randomization not detected. Blinding not detected. Power Analysis Sample size rationale: We calculated with “easyROC” (20), an open R-based web-tool for estimating sample sizes for AUC direct and non-inferior comparisons using Obuchowski’s method (21) that; for detecting no-inferiority with a >0.05 maximal AUC difference with the reported LOW-HARM AUC (0.96 95% CI:0.94 – 0.98) with a case allocation ratio of 0.7 (because the mortality in our centre is ∼ 0.3) with a … SciScore for 10.1101/2021.04.16.21255647: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: This study was approved by the Ethics Committee for Research on Humans of the National Institute of Medical Sciences and Nutrition Salvador Zubirán on August 25, 2020 (Reg. No. DMC-3369-20-20-1-1a). Randomization not detected. Blinding not detected. Power Analysis Sample size rationale: We calculated with “easyROC” (20), an open R-based web-tool for estimating sample sizes for AUC direct and non-inferior comparisons using Obuchowski’s method (21) that; for detecting no-inferiority with a >0.05 maximal AUC difference with the reported LOW-HARM AUC (0.96 95% CI:0.94 – 0.98) with a case allocation ratio of 0.7 (because the mortality in our centre is ∼ 0.3) with a power of 0.8 and a significance cut-off level of 0.05, 159 patients would be needed. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources The AUCs differences were analysed using DeLong’s method with the STATA function “roccomp” (22). STATAsuggested: (Stata, RRID:SCR_012763)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:This work highlights the inherent limitations of statistically derived scores and some of the advantages of Clinical Gestalt predictions. In other scenarios where using predictive scores is frequent, more experienced clinicians can always ponder their sometimes subjective yet, quite valuable insight. However, with the COVID-19 pandemic clinicians with all levels of training started their learning curve at the same time. In this study, we had the unique opportunity of re-evaluating more than one score (two of them in the same setting and for the same purpose they were designed for), while testing the accuracy of Clinical Gestalt, in a group of clinicians who started their learning curve for managing a disease at the same time (experience and training withing healthcare teams is usually mixed for other diseases). Additionally, we explored the accuracy of Clinical Gestalt across different degrees of prediction confidence. To our knowledge, this is the first time that this type of analysis is done for subjective clinical predictions and proved to be quite insightful. The fact that Clinical Gestalt’s accuracy correlates with confidence in prediction, suggests that while there is value in subjective predictions, it is also important to ask ourselves about how confident we are about our predictions. Interestingly, our results suggest Clinical Gestalt predictions are particularly prone to be positively biased, clinicians were more likely to correctly predict which patients would surv...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 27. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-