Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data

This article has been Reviewed by the following groups

Read the full article

Abstract

Objective

The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity.

Materials and Methods

Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site.

Results

The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability—up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review.

Discussion

We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions.

Conclusions

We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.

Article activity feed

  1. SciScore for 10.1101/2020.10.13.20201855: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    4CE Detailed Severity Definition: The codification of the following data elements results in ∼100 codes in ICD-9, ICD-10, LOINC, and RxNorm format, international standards used for research.
    RxNorm
    suggested: (RxNorm, RRID:SCR_006645)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Validation of these proxies is essential so that we can understand their strengths and limitations. Furthermore, if research is to be performed on a network and especially global scale, the outcome proxies must use data types broadly available through most EHRs and also be validated at multiple sites to account for the differences of coding patterns that occur. Examining subgroup performance of the codes can further improve our ability to understand cross-site differences. In this study, our primary aim was to develop and validate an EHR-based severity algorithm for the 4CE network to enable network-wide research on COVID-19 across numerous heterogeneous sites. The EHR proxies we used to test for severity included commonly available elements in the EHR: diagnosis codes, laboratory orders, medication orders, and procedure codes. These elements improve our ability to infer the presence of respiratory distress and shock, which presumably are serious enough to lead to ICU admission, if available, and/or death. This study highlights the frequent presence of coding differences between sites, as demonstrated by the remarkable variation of sensitivity by code class. Moreover, the codes captured for the severity algorithm at each site are very different. For example, some sites had a very high prevalence of mechanical ventilation codes and blood gas orders, whereas others had a low prevalence of these same measures, likely due to practice variation and code extraction differences. We ...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.