A multicenter evaluation of computable phenotyping approaches for SARS-CoV-2 infection and COVID-19 hospitalizations

This article has been Reviewed by the following groups

Read the full article

Abstract

Diagnosis codes are used to study SARS-CoV2 infections and COVID-19 hospitalizations in administrative and electronic health record (EHR) data. Using EHR data (April 2020–March 2021) at the Yale-New Haven Health System and the three hospital systems of the Mayo Clinic, computable phenotype definitions based on ICD-10 diagnosis of COVID-19 (U07.1) were evaluated against positive SARS-CoV-2 PCR or antigen tests. We included 69,423 patients at Yale and 75,748 at Mayo Clinic with either a diagnosis code or a positive SARS-CoV-2 test. The precision and recall of a COVID-19 diagnosis for a positive test were 68.8% and 83.3%, respectively, at Yale, with higher precision (95%) and lower recall (63.5%) at Mayo Clinic, varying between 59.2% in Rochester to 97.3% in Arizona. For hospitalizations with a principal COVID-19 diagnosis, 94.8% at Yale and 80.5% at Mayo Clinic had an associated positive laboratory test, with secondary diagnosis of COVID-19 identifying additional patients. These patients had a twofold higher inhospital mortality than based on principal diagnosis. Standardization of coding practices is needed before the use of diagnosis codes in clinical research and epidemiological surveillance of COVID-19.

Article activity feed

  1. SciScore for 10.1101/2021.03.16.21253770: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIRB: [3] The study was approved by the Yale University Institutional Review Board (IRB # 2000027747).
    RandomizationManual Chart Abstraction and Validation: Manual chart abstraction was conducted by 2 clinicians independently (RK and WLS) and focused on a sample of randomly selected charts where the diagnosis codes were discordant from laboratory results.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    To evaluate the effect of coding strategies on case identification among racial and ethnic minorities, we combined racial/ethnic groups into mutually exclusive groups of Hispanic, non-Hispanic White, non-Hispanic Black, and other race/ethnicity groups.[22 ,23] Study Outcome: Among patients hospitalized with COVID-19, we evaluated differences in mortality across case identification strategy.
    non-Hispanic White
    suggested: None
    Software and Algorithms
    SentencesResources
    Analyses were conducted using Spark 2.3.2, Python 3.6.9, and R 3.8.
    Spark
    suggested: (Spark, RRID:SCR_006207)
    Python
    suggested: (IPython, RRID:SCR_001658)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study has certain limitations. First, while we focus on a broad interconnected health system and affiliated laboratories and receive testing information from laboratories that are connected to the Epic EHR, it is possible that some of the non-participating laboratory data are not available from testing in the outpatient setting. However, in manual chart review of a sample of patients with a diagnosis of COVID-19 without a reported positive PCR or antigen test by 2 clinicians, all such records were for patients undergoing SARS-CoV-2 testing with the diagnosis assigned for the clinical or laboratory encounter to obtain the test. Second, we cannot account for differences in coding practices at other institutions. However, our study that includes a large integrated multi-hospital health system, which increases the generalizability of our observations. Moreover, a larger variation in coding of diagnoses for SARS-CoV-2 infection surveillance would further highlight the lack of reliability of the measures.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.