Improving Polygenic Score Prediction for Underrepresented Groups Through Transfer Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The advent of big data from GWAS consortia and biobanks led to remarkable improvements in polygenic score (PGS) prediction accuracy. However, most PGS were derived using data from Europeans (EU) and performed poorly when used to predict phenotypes of non-Europeans. Transfer Learning (TL) is a technique by which knowledge gained using data from one population is used to improve a model’s performance in another population. Here, we present GPTL, an R-package implementing three methods to build PGS using TL: gradient descent with early stopping, a penalized regression that shrinks variant effect estimates toward prior values, and a Bayesian model using a finite mixture prior that enables TL from multiple prior sources of information. Using simulated and real data from the UK-Biobank and All of Us, we showed that PGS derived using the TL algorithms implemented in the GPTL R-package performed better than PGS derived with EU or non-EU data only, and also outperformed (both in terms of accuracy and computational performance) commonly used methods to build PGSs using multi-ancestry data.

Article activity feed