Within-family attenuation of polygenic risk score accuracy: Investigating the effects of principal component analysis, LD score regression, and mixed model association in the UK Biobank
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
A central challenge in the field of polygenic risk prediction has been measuring and controlling for confounding effects mediated through population stratification. Traditionally, such control has been attempted through the inclusion of the top principal components (PCs) of variation and the use of linear mixed models in genome-wide association studies (GWAS). Reductions in test-statistic inflation, commonly assessed using the genomic inflation factor ( λ ) and the linkage disequilibrium score regression (LDSC) intercept, as well as the preservation of polygenic risk score (PRS) predictive performance in within-family settings, are often taken as evidence that such confounding has been adequately controlled.
In this study, we examine the relationship between the number of PCs included during GWAS/PRS model development and the observed attenuation in performance when moving from a population-level setting to a cohort composed of discordant sibling pairs (within-family attenuation). The design enables the detection of confounding attributable to the environment and genetic background effects correlated with population structure. Analyses were conducted in the self-described White subset of UK Biobank (UKB) for coronary artery disease, type 2 diabetes, breast cancer, and prostate cancer. Educational attainment was included as a comparison trait, as it is known to exhibit substantial within-family attenuation.
We find that increasing the number of included PCs does not consistently reduce within-family attenuation among the traits examined. Moreover, reductions in attenuation do not closely track decreases in λ or the LDSC intercept, and the use of mixed model-based approaches provides little additional benefit for prediction or attenuation. These patterns persist even in settings where GWAS test-statistic inflation has been substantially reduced.
Our results suggest that the confounding in PRS targeted by PC-adjustment and linear mixed models is either minimal in the self-described White subset of the UKB or persists in ways that are not adequately captured by current population structure adjustment methods. Taken together, our findings suggest limitations as to how much standard population structure adjustment methods, or reductions in test statistic inflation, are currently improving the causal validity of PRS built using the UK Biobank.