A simple approach for multiple observations improves power to detect genetic effects and genomic prediction accuracy
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Many datasets, including widely used biobanks, have more than one observation of numerous phenotypes for at least a portion of their sample. The majority of GWAS utilize only a single observation per individual, even when more than one observation may be available, and apply a standard model in which the additive allelic effect being estimated is assumed to be constant across the age or time range in the sample. Here, we test a set of simple approaches to utilize multiple observations per individual, under this same assumption. We find that utilizing the mean or median of the available observations rather than a single observation improves power to detect associated loci and enriched gene sets and yields higher out-of-sample polygenic score prediction accuracy. Despite growing biobanks, many deeply phenotyped samples are relatively small but have multiple observations. While explicitly modeling age- or time-dependent genetic effects can estimate time- or age-specific genetic effects, most GWAS apply a standard, additive-only model; a simple approach of using the mean or median can improve power by reducing “noise” in the phenotype, utilize standard, optimized software, and be particularly impactful for smaller samples, including samples of diverse genetic ancestry currently existing in widely used biobanks.