Clade distillation for genome-wide association studies

Ryan Christ
Xinxin Wang
Louis J M Aslett
David Steinsaltz
Ira Hall

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Testing inferred haplotype genealogies for association with phenotypes has been a longstanding goal in human genetics given their potential to detect association signals driven by allelic heterogeneity—when multiple causal variants modulate a phenotype—in both coding and noncoding regions. Recent scalable methods for inferring locus-specific genealogical trees along the genome, or representations thereof, have made substantial progress towards this goal; however, the problem of testing these trees for association with phenotypes has remained unsolved due to the growth in the number of clades with increasing sample size. To address this issue, we introduce several practical improvements to the kalis ancestry inference engine, including a general optimal checkpointing algorithm for decoding hidden Markov models, thereby enabling efficient genome-wide analyses. We then propose LOCATER, a powerful new procedure based on the recently proposed Stable Distillation framework, to test local tree representations for trait association. Although LOCATER is demonstrated here in conjunction with kalis, it may be used for testing output from any ancestry inference engine, regardless of whether such engines return discrete tree structures, relatedness matrices, or some combination of the two at each locus. Using simulated quantitative phenotypes, our results indicate that LOCATER achieves substantial power gains over traditional single marker testing, ARG-Needle, and window-based testing in cases of allelic heterogeneity, while also improving causal region localization. These findings suggest that genealogy-based association testing will be a fruitful approach for gene discovery, especially for signals driven by multiple ultra-rare variants.

Version published to 10.1093/genetics/iyaf158
Aug 7, 2025
Version published to 10.1101/2024.09.30.615852 on bioRxiv
Oct 1, 2024

RERconverge Update: Runtime Reduction and Analysis Function Overhaul

This article has 9 authors:
1. Guillermo L. Hoffmann
2. Emily E. K. Kopania
3. Michael Tene
4. Amanda Kowalczyk
5. Ruby Redlich
6. Andreas R. Pfenning
7. Wynn K. Meyer
8. Maria Chikina
9. Nathan L. Clark
This article has no evaluationsLatest version Jun 10, 2026
CMAPLE 2: Fast and Accurate Phylogenetic Inference for Millions of Pathogen Genomes

This article has 5 authors:
1. Nhan Ly-Trong
2. Samuel Martin
3. Nick Goldman
4. Nicola De Maio
5. Bui Quang Minh
This article has no evaluationsLatest version Jun 16, 2026
Current challenges in GWAS integration and fine-mapping for variant interpretation

This article has 9 authors:
1. Omar Y. Ahmed
2. Neha Saravanan
3. Anne B. Rovsing
4. Danny Simpson
5. Archit Devarajan
6. Sophia Gunn
7. Tarjinder Singh
8. Tuuli Lappalainen
9. Neville E. Sanjana
This article has no evaluationsLatest version Jul 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

RERconverge Update: Runtime Reduction and Analysis Function Overhaul

CMAPLE 2: Fast and Accurate Phylogenetic Inference for Millions of Pathogen Genomes

Current challenges in GWAS integration and fine-mapping for variant interpretation