Optimizing Polygenic Scores for Complex Morphological Traits: A Case Study in Nasal Shape Prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Polygenic scores (PGS) facilitate the prediction of an individual’s phenotype from their genotype. Typically, PGS methods apply regularization or use a clumping and thresholding (C+T) approach to handle SNP inclusion. To achieve good prediction accuracy, these approaches rely on effect size estimates from well-powered genome-wide association studies (GWAS). However, this is currently not feasible for morphological shape when phenotyped as univariate traits. Here, we introduce a novel framework to enhance polygenic prediction through three key components: (1)leveraging multivariate GWAS summary statistics for improved SNP selection, (2) defining genetically informative phenotypes, and (3) benchmarking PGS methods to select the optimal model. Our approach integrates multivariate GWAS, which performs an omnibus test against all phenotypic variables jointly with increased power. Specifically, our approach leverages P values from multivariate GWAS to improve SNP selection while maintaining the effect size estimates for the univariate trait under investigation, allowing the use of current PGS tools. We evaluated our proposed method for predicting 3D nasal morphology using a dataset of 52,896 individuals of European ancestry from the UK Biobank. Using the C+T method, SNP selection based on multivariate GWAS resulted in significantly improved phenotypic prediction ( P = 9.74e-5) for eigen-shapes, with a mean variance explained of 3.88% (SD = 1.59%) compared to 2.02% (SD = 1.10%) using a traditional univariate approach in the test set (n = 2,896). We also tested whether heritability-optimized phenotypes were more predictable than eigen-shapes derived from principal component analysis (PCA). On average, with SNP selection based on multivariate GWAS using the C+T method, heritability-optimized phenotypes yielded greater predictive performance, with PGS scores explaining 2.72%-10.37% of phenotypic variance, compared to 1.05%-6.84% for eigen-shapes. Furthermore, benchmarking several PGS methods revealed that LDpred2 consistently achieved the best performance for predicting nasal morphology. Our results demonstrate that combining multivariate GWAS P values with optimized phenotypes and advanced PGS models leads to more accurate polygenic prediction for complex morphological traits.