MultiPopPred: A Trans-Ethnic Disease Risk Prediction Method, and its Application to the South Asian Population
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome-wide association studies (GWAS) have guided significant contributions towards identifying disease associated Single Nucleotide Polymorphisms (SNPs) in Caucasian populations, albeit with limited focus on other understudied low-resource non-Caucasian populations. There have been active efforts over the years to understand and exploit the population specific vs. shared aspects of the genotype-phenotype relation across different populations or ethnicities to bridge this gap. However no single approach that unanimously outperforms all other methods exists. Furthermore, the efficacy of transfer learning models that are simpler than existing approaches remains an open question. We propose MultiPopPred, a suite of novel and simple trans-ethnic polygenic risk score (PRS) estimation methods, that tap into the shared genetic risk across populations and transfer information learned from multiple well-studied auxiliary populations to a less-studied target population. MultiPopPred employs a specially designed Nesterov-smoothed penalized shrinkage model and a L-BFGS optimization routine. Extensive comparative analyses performed on simulated genotype-phenotype data reveal that MultiPopPred improves PRS prediction in the South Asian population by 69% on settings with low target sample sizes, by 19% overall across all simulation settings, and by 73% overall across all semi-simulated settings when compared to state-of-the-art trans-ethnic PRS estimation methods. We further observe a 44% overall improvement in PRS prediction across 8 quantitative real-world traits from UKBiobank. This performance trend is promising and encourages application of MultiPopPred for reliable PRS estimation under resource constrained real-world settings.