Differential performance of polygenic prediction across traits and populations depending on genotype discovery approach
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Polygenic scores (PGS) are widely used for estimating genetic predisposition to complex traits by aggregating the effects of common variants into a single measure. They hold promise in identifying individuals at increased risk for diseases, allowing earlier screening and interventions. Genotyping arrays, commonly used for PGS computation, are affordable and computationally efficient, while whole-genome sequencing (WGS) offers a comprehensive view of genetic variation. Using the same set of individuals, we compared PGS derived from arrays and WGS across multiple traits to evaluate differences in predictive performance, portability across populations, and computational efficiency. We computed PGS for 10 traits across the spectrum of heritability and polygenicity in the three largest genetic ancestry groups in All of Us (European, African American, Admixed American), trained on the multi-ancestry meta-analyses from the Pan-UK Biobank. Using the clumping and thresholding (C+T) method, we found that WGS-based PGS outperformed array-based PRS for highly polygenic traits but showed differentially reduced accuracy for sparse traits in certain populations. This may be attributable to the lower allele frequency observed in clumped variants from WGS compared to arrays. Using the LD-informed PRS-CS method, we observed overall improved prediction performance compared to C+T, with WGS outperforming arrays across most non-cancer traits. In conclusion, while PGS computed using WGS generally provide superior predictive power with PRS-CS, the advantage over arrays is context-dependent, varying by trait, population, and the PGS method. This study provides insights into the complexities and potential advantages of using different genotype discovery approach for polygenic predictions in diverse populations.