Modelling complex traits with ancestral recombination graphs
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The ancestral recombination graph (ARG) is a powerful tool for storing and analyzing large genomic datasets, as demonstrated by the ecosystem of software tools taking advantage of the succinct tree sequence ( tskit ) format to store and analyse ARGs. Tree sequences have become standard in evolutionary genetics, and researchers are now exploring new applications to address the analytical challenges arising from the rapid growth of data. To this end, we propose ARG-LMM, a generative model which describes how complex traits emerge from ARGs, and tslmm , a tree sequence-based algorithm which fits ARG-LMM to simulated and inferred genealogies. The distribution of a trait is deduced from evolutionary theory, avoiding arbitrary probabilistic and statistical assumptions. The model subsumes existing quantitative genetics models by considering all past DNA coalescence and recombination events encoded in an ARG. This formulation provides a framework to study the effect of evolutionary forces on genomes and corresponding phenotypes. tslmm can work with ARGs encoding whole genomes of tens of thousands of individuals through efficient graph traversal algorithms with an 𝒪 ( N ) time complexity, up to a logarithmic factor, a step change over most existing algorithms that require at least 𝒪 ( N 2 ) operations. In summary, this work presents a new quantitative genetic model (ARG-LMM) based on evolutionary theory and a new software ( tslmm ) for scalable analysis of complex traits with large genomic datasets.