Measuring the accuracy of electronic health record (EHR)-based phenotyping in the All of Us Research Program to optimize statistical power for genetic association testing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate phenotyping is an essential task for researchers utilizing electronic health record (EHR)-linked biobank programs like the All of Us Research Program ( AoU ) to study human genetics. While their large cohort sizes offer increased statistical power for detecting novel risk alleles, those benefits are undermined if participants’ disease status cannot be accurately determined from EHRs. Little guidance is available on how to select an EHR-based phenotyping procedure that maximizes downstream statistical power. We used observed carrier frequencies of known risk genes for ovarian, female breast, and colorectal cancers to estimate the accuracy of EHR-based phenotyping strategies for each disease in AoU (v7). We found that the choice of phenotype definition can have a substantial impact on statistical power for association testing, particularly for rarer diseases. Additionally, our results suggest that the accuracy of higher-complexity phenotyping algorithms is inconsistent across Black and non-Hispanic White participants in AoU , highlighting the potential for case ascertainment biases to impact downstream association testing. We discuss the implications of this as well as potential mitigation strategies.

Article activity feed