Performance of deep-learning based approaches to improve polygenic scores

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background/Objectives

Polygenic scores (PGS), which estimate an individual’s genetic propensity for a disease or trait, have the potential to become part of genomic healthcare. In maximising the predictive performance of PGS, neural-network (NN) based deep learning has emerged as a method of intense interest to model complex, nonlinear phenomena, which may be adapted to exploit gene-gene (G x G) and gene-environment (G x E) interactions.

Methods

To infer the amount of nonlinearity present in a phenotype, we present a framework for using NNs, which controls for the potential confounding effect of correlation between genetic variants, i.e. linkage disequilibrium (LD). We fit NN models to both simulated traits and 28 real disease and anthropometric traits in the UK Biobank.

Results

Simulations confirmed that our framework adequately controls LD and can infer nonlinear effects, when such effects genuinely exist. Using this approach on real data, we found evidence for small amounts of nonlinearity due to G x G and G x E which mildly improved prediction performance (r 2 ) by ∼7% and ∼4%, respectively. Despite evidence for nonlinear effects, NN models were outperformed by linear regression models for both genetic-only and genetic+environmental input scenarios with ∼7% and ∼5% differences in r 2 , respectively. Importantly, we found substantial evidence for confounding by joint tagging effects, whereby inferred G x G was actually LD with due to unaccounted for additive genetic variants.

Conclusion

Our results indicate that the usefulness of NNs for generating polygenic scores for common traits and diseases may currently be limited and may be confounded by joint tagging effects due to LD.

Article activity feed