Prediction of Polygenic Risks by Screening Thousands of Polygenic Scores

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Rationale

A polygenic score (PGS) summarizes a person’s genetic information in a single number for a trait, and its utility in genomic medicine is well-recognized. Although PGS models have been generated for many traits, they are not broadly available for certain traits due to limited sample sizes in studies of infrequent outcomes. Often, prediction of these not well-studied traits ( e.g. , treatment/drug responses) would be of great clinical utility and have been underutilized due to statistical power limitations. To support more versatile trait prediction, we present a method of developing PGS models that can be used in studies of any size with genome-wide SNP data.

Method

We first generate thousands of PGSs for each study participant in a given data set using their genome-wide SNP data and public resources. A PGS-wide scan involves evaluating the Area Under the Curve (AUC) of prediction for a binary trait (or the R-squared of association for a quantitative trait) at each PGS. We present two methods for the PGS model development, SECRET-Best, which selects the most predictive PGS from the PGS scan for prediction, and SECRET-WTSUM, which considers a combined score from multiple correlation-pruned PGSs. This algorithm is scalable and implemented in a user-friendly software tool, SECRET (Screen and Evaluate Catalogued Risk scores to Enhance Trait predictions).

Results

We applied SECRET to a binary outcome (type 1 diabetes [T1D]) and a dataset of 2,100 samples, each with 12 laboratory test-related continuous traits. For nine traits with existing PGSs available in the PGS catalog, eight of the traits were predicted correctly, with the same trait-related PGS identified as the top predictor. We showed that the existing PGS methods had rather limited power to predict trait values in a validation set when only 1,500 samples were used to develop a PGS model, while the SECRET methods were able to maintain the prediction power for under-powered GWAS studies even when the sample size of the study was in hundreds.

Conclusion

The SECRET methods and tool provide a valuable resource for studies with genomic data but of limited sample sizes. This approach enables systematic development and evaluation of PGS models.

Article activity feed