Clinical care site data integration reveals heterogeneity in EHR phenotyping and healthcare utilization patterns

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective

Genomic research using electronic health record (EHR)-linked biobanks is influenced by heterogeneity in the clinical settings (care sites) where encounters occur. We developed two methods leveraging care site data: ClinicScan identifies where phenotype documentation occurs, and ClinicWAS identifies specialty utilization patterns associated with a risk factor.

Materials and Methods

We extracted care sites for each clinical encounter at an academic medical center and mapped each to a clinical specialty. ClinicScan summarizes the specialty distribution of a user-specified diagnosis; ClinicWAS fits a logistic regression for each care site to identify specialty encounters associated with a user-specified risk factor. We applied ClinicScan to depression to test whether requiring a psychiatry encounter strengthened the association between a polygenic risk score (PRS) and a depression phenotype, and ClinicWAS to a coronary heart disease (CHD) PRS to identify sites enriched for high-risk patients.

Results

Across 64,983,257 encounters, 2,544 care sites mapped to 57 specialties. Most depression diagnoses occurred in primary care (30.3%) and psychiatry (19.8%). Requiring a psychiatry encounter strengthened the PRS-phenotype association (OR=1.30, 95% CI 1.26–1.35) versus two or more diagnosis codes alone (OR=1.21, 95% CI 1.19–1.24). CHD ClinicWAS identified 19 associated care sites, including 5 catheterization labs. Men and women with high genetic risk (PRS≥95th percentile) underwent catheterization for CHD 3.1 (1.5–4.6) and 4.6 (2.5–6.7) years earlier than normal-risk participants, respectively.

Discussion

Care site data capture phenotype heterogeneity that otherwise distorts EHR-based phenotypes and obscures high-risk subpopulations.

Conclusion

Clinical care site data are an under-utilized resource in EHR-linked biobanks.

Article activity feed