Improving Polygenic Score Prediction for Underrepresented Groups Through Transfer Learning

Hao Wu
Paulino Pérez-Rodríguez
Michael Boehnke
Yuehua Cui
Xiaoyu Liang
Ana I. Vazquez
Gustavo de los Campos

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The advent of big data from GWAS consortia and biobanks led to remarkable improvements in polygenic score (PGS) prediction accuracy. However, most PGS were derived using data from Europeans (EU) and performed poorly when used to predict phenotypes of non-Europeans. Transfer Learning (TL) is a technique by which knowledge gained using data from one population is used to improve a model’s performance in another population. Here, we present GPTL, an R-package implementing three methods to build PGS using TL: gradient descent with early stopping, a penalized regression that shrinks variant effect estimates toward prior values, and a Bayesian model using a finite mixture prior that enables TL from multiple prior sources of information. Using simulated and real data from the UK-Biobank and All of Us, we showed that PGS derived using the TL algorithms implemented in the GPTL R-package performed better than PGS derived with EU or non-EU data only, and also outperformed (both in terms of accuracy and computational performance) commonly used methods to build PGSs using multi-ancestry data.

Version published to 10.1101/2025.10.08.25337572 on medRxiv
Oct 9, 2025

Within-family validation of a new polygenic predictor of general cognitive ability

This article has 6 authors:
1. Tobias Wolfram
2. Spencer Moore
3. Jeremiah H. Li
4. Jonathan Anomaly
5. Ivan Davidson
6. Michael Christensen
This article has no evaluationsLatest version Dec 11, 2025
Approximating prediction error variances and reliabilities in multiple-trait genomic prediction model using Monte Carlo sampling

This article has 5 authors:
1. Antero Heikkilä
2. Ismo Strandèn
3. Martin Lidauer
4. Klaus Nordhausen
5. Sara Taskinen
This article has no evaluationsLatest version Dec 15, 2025
Bayesian Network Structure Learning from Incomplete Breast Cancer Data Using Structural Expectation–Maximization

This article has 3 authors:
1. Navaee Lavasani Monireh
2. Rezaeitabar Vahid
3. Khayamzadeh Maryam
This article has no evaluationsLatest version Dec 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Within-family validation of a new polygenic predictor of general cognitive ability

Approximating prediction error variances and reliabilities in multiple-trait genomic prediction model using Monte Carlo sampling

Bayesian Network Structure Learning from Incomplete Breast Cancer Data Using Structural Expectation–Maximization