Prediction of Polygenic Risks by Screening Thousands of Polygenic Scores

Wei-Min Chen
Ani Manichaikul
Suna Onengut-Gumuscu
Bradford B. Worrall
Stephen S. Rich

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Rationale

A polygenic score (PGS) summarizes a person’s genetic information in a single number for a trait, and its utility in genomic medicine is well-recognized. Although PGS models have been generated for many traits, they are not broadly available for certain traits due to limited sample sizes in studies of infrequent outcomes. Often, prediction of these not well-studied traits ( e.g. , treatment/drug responses) would be of great clinical utility and have been underutilized due to statistical power limitations. To support more versatile trait prediction, we present a method of developing PGS models that can be used in studies of any size with genome-wide SNP data.

Method

We first generate thousands of PGSs for each study participant in a given data set using their genome-wide SNP data and public resources. A PGS-wide scan involves evaluating the Area Under the Curve (AUC) of prediction for a binary trait (or the R-squared of association for a quantitative trait) at each PGS. We present two methods for the PGS model development, SECRET-Best, which selects the most predictive PGS from the PGS scan for prediction, and SECRET-WTSUM, which considers a combined score from multiple correlation-pruned PGSs. This algorithm is scalable and implemented in a user-friendly software tool, SECRET (Screen and Evaluate Catalogued Risk scores to Enhance Trait predictions).

Results

We applied SECRET to a binary outcome (type 1 diabetes [T1D]) and a dataset of 2,100 samples, each with 12 laboratory test-related continuous traits. For nine traits with existing PGSs available in the PGS catalog, eight of the traits were predicted correctly, with the same trait-related PGS identified as the top predictor. We showed that the existing PGS methods had rather limited power to predict trait values in a validation set when only 1,500 samples were used to develop a PGS model, while the SECRET methods were able to maintain the prediction power for under-powered GWAS studies even when the sample size of the study was in hundreds.

Conclusion

The SECRET methods and tool provide a valuable resource for studies with genomic data but of limited sample sizes. This approach enables systematic development and evaluation of PGS models.

Version published to 10.1101/2025.10.03.680330 on bioRxiv
Oct 3, 2025

Within-family validation of a new polygenic predictor of general cognitive ability

This article has 6 authors:
1. Tobias Wolfram
2. Spencer Moore
3. Jeremiah H. Li
4. Jonathan Anomaly
5. Ivan Davidson
6. Michael Christensen
This article has no evaluationsLatest version Dec 11, 2025
Application of longitudinal follow-up data increases power in the identification of genetic loci for type 2 diabetes

This article has 1 author:
1. Seong Beom Cho
This article has no evaluationsLatest version Dec 18, 2025
Derivation of prediction error variance for non-genotyped individuals in genomic selection

This article has 3 authors:
1. Vinícius Junqueira
2. Marcos Jun-Iti Yokoo
3. Fernando Flores
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

Rationale

Method

Results

Conclusion

Article activity feed

Related articles

Within-family validation of a new polygenic predictor of general cognitive ability

Application of longitudinal follow-up data increases power in the identification of genetic loci for type 2 diabetes

Derivation of prediction error variance for non-genotyped individuals in genomic selection