Modeling nonlinear and interaction effects of spatiotemporal and other non-genetic factors improves phenotypic prediction for complex traits

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Adjusting for non-genetic factors can improve genetic association testing and polygenic prediction, yet most studies rely on linear adjustments for a small set of covariates. Location and time covariates are typically not included in genetic studies, but they can serve as useful proxies for environmental exposures such as climate, pollution, and behavioural differences. These features especially highlight the weakness of linear covariate adjustments, which cannot capture their cyclical or nonlinear effects nor interactions between covariates. To evaluate the benefits of modeling nonlinear covariate effects, including spatiotemporal features, we adopted a null model approach in which an auxiliary nonlinear model predicts the phenotype from non-genetic covariates alone. This prediction is then included as an additional covariate in downstream analysis, allowing association studies, polygenic scores, and fine-mapping to account for nonlinear covariate effects and interactions without modifying existing tools. Using 16 continuous phenotypes from the UK Biobank, we first evaluated gradient boosted decision tree (GBDT) and neural network null models with combinations of additional time-of-day, time-of-year, and birth or home location covariates and determined how well these models could predict phenotypes from covariates alone compared to linear models. A GBDT null improved average R 2 by 4.3% over a linear null across all phenotype-covariate set pairs, and models including spatiotemporal covariates achieved the best performance for all phenotypes. Incorporating nonlinear null predictions with additional spatiotemporal features improved polygenic scores for all phenotypes, with median improvements of 7.3% for BASIL and 15.5% for PRS-CS over the standard approach. We additionally tested three case-control phenotypes and observed improvement beyond the 95% CI of the standard approach for depression. Finally, model interpretation identified both known and novel covariate interactions and nonlinear effects. This highlighted several important cases which linear adjustments would be unable to capture, including examples where phenotype means exhibit opposite age-related trends between sexes, cyclical temporal patterns for vitamin D reflecting seasonal variation, and complex spatial dependencies between birth and home locations. Together, these results demonstrate that a small addition to existing genetic analysis workflows can improve predictive performance and provide new insights on environmental effects for complex traits.

Article activity feed