A simple approach for multiple observations improves power to detect genetic effects and genomic prediction accuracy

Luke M. Evans
Christopher H. Arehart
Raine A. Gibson
Grace I. Bowman
Christopher R. Gignoux

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Many datasets, including widely used biobanks, have more than one observation of numerous phenotypes for at least a portion of their sample. The majority of GWAS utilize only a single observation per individual, even when more than one observation may be available, and apply a standard model in which the additive allelic effect being estimated is assumed to be constant across the age or time range in the sample. Here, we test a set of simple approaches to utilize multiple observations per individual, under this same assumption. We find that utilizing the mean or median of the available observations rather than a single observation improves power to detect associated loci and enriched gene sets and yields higher out-of-sample polygenic score prediction accuracy. Despite growing biobanks, many deeply phenotyped samples are relatively small but have multiple observations. While explicitly modeling age- or time-dependent genetic effects can estimate time- or age-specific genetic effects, most GWAS apply a standard, additive-only model; a simple approach of using the mean or median can improve power by reducing “noise” in the phenotype, utilize standard, optimized software, and be particularly impactful for smaller samples, including samples of diverse genetic ancestry currently existing in widely used biobanks.

Version published to 10.1101/2025.09.19.25336197 on medRxiv
Sep 21, 2025

Application of longitudinal follow-up data increases power in the identification of genetic loci for type 2 diabetes

This article has 1 author:
1. Seong Beom Cho
This article has no evaluationsLatest version Dec 18, 2025
A resource of “bottom-line” variant associations for 1,281 complex traits by integrating data across published genome-wide association studies

This article has 24 authors:
1. Trang Nguyen
2. Furkan Büyükgöl
3. Patrick Smadbeck
4. Jeffrey Massung
5. Maria Costanzo
6. Monica Ruiz
7. Peter Dornbos
8. Satoshi Yoshiji
9. Ryan Koesterer
10. Thanh Long Nguyen
11. Dongkeun Jang
12. Quy Hoang
13. Yue Ji
14. Aoife McMahon
15. Sebanti Sengupta
16. Xianyong Yin
17. Brady Ryan
18. Ryan Welch
19. Jorien Treur
20. Connie Bezzina
21. Gonçalo R. Abecasis
22. Michael Boehnke
23. Noel Burtt
24. Jason Flannick
This article has no evaluationsLatest version Jan 22, 2026
Causal effect heterogeneity estimation using summary statistics

This article has 8 authors:
1. Xingjie Shi
2. Yadong Yang
3. Minxi Bai
4. Jiacheng Miao
5. Stephen Dorn
6. Jonathan Haugstad
7. Jin Liu
8. Qiongshi Lu
This article has no evaluationsLatest version Jan 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Application of longitudinal follow-up data increases power in the identification of genetic loci for type 2 diabetes

A resource of “bottom-line” variant associations for 1,281 complex traits by integrating data across published genome-wide association studies

Causal effect heterogeneity estimation using summary statistics