Variant Set Distillation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Allelic heterogeneity – the presence of multiple causal variants at a given locus – has been widely observed across human traits. Combining the association signals across these distinct causal variants at a given locus presents an opportunity for empowering gene discovery. This opportunity is growing with the increasing population diversity and sequencing depth of emerging genomic datasets. However, the rapidly increasing number of null (non-causal) variants within these datasets makes leveraging allelic heterogeneity increasingly difficult for existing testing approaches. We recently-proposed a general theoretical framework for sparse signal problems, Stable Distillation (SD). Here we present a SD-based method vsdistill , which overcomes several major shortcomings in the simple SD procedures we initially proposed and introduces many innovations aimed at maximizing power in the context of genomics. We show via simulations that vsdistill provides a significant power boost over the popular STAAR method. vsdistill is available in our new R package gdistill, with core routines implemented in C. We also show our method scales readily to large datasets by performing an association analysis with height in the UK Biobank.