Within-family validation of polygenic risk scores in the UK Biobank: Investigating the role of principal component analysis and mixed model association

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A central challenge in the field of polygenic risk prediction has been measuring and controlling for confounding effects mediated through population stratification.

Traditionally, control has been attempted through the inclusion of the top principal components (PCs) of variation and the use of linear mixed models in genome-wide association studies (GWAS). Calculating the genomic inflation factor ( λ ) and assessing the predictive ability of polygenic risk scores (PRS) in within-family settings have been considered important steps in validating that GWAS summary statistics are relatively unbiased estimates of causal effects.

In this study, we examined the relationship between the number of PCs included during GWAS/PRS model development and the observed attenuation in performance when moving from a population-level setting to a cohort composed of discordant sibling pairs. This was done for four complex diseases in the UK Biobank: coronary artery disease, type 2 diabetes, breast cancer, and prostate cancer. Comparisons were made with educational attainment, a trait well-known to be highly prone to environmental confounding.

We find that, contrary to expectation, increasing the number of included PCs does not consistently reduce within-family attenuation. Furthermore, any reduction in predictive attenuation does not appear to closely correlate with reduction in the genomic inflation factor λ . Additionally, there appears to be little added benefit to prediction or reduced attenuation with the addition of mixed model-based approaches. Taken together, our results demonstrate that the effects of controlling for population stratification can be trait-specific and can vary substantially depending on the number of SNPs or PCs included in the PRS model. These results highlight the inherent complexity of controlling for population stratification in the UK Biobank and suggest that further research is needed to establish best practices for both detecting and effectively minimizing confounding during polygenic prediction tasks.

Article activity feed