Estimating dispersal rates and locating genetic ancestors with genome-wide genealogies

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The relationship between homologous chromosomes sampled in a population can be described by an "ancestral recombination graph" or as a "forest" of correlated coalescent trees describing the relationship at each locus on the chromosome. It has long been clear that this graph contains enormous amounts of information about the history of the population, and should be used in analysis. Hitherto this has been computationally infeasible, but recently developed methods are starting to make it possible, and this paper is one of the first attempts to do so. The paper should be of interest to anyone working with population genetic inference, although there are concerns about possible bias in the estimates from the 1001 Arabidopsis Genomes that need to be resolved.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

This article has been Reviewed by the following groups

Read the full article

Abstract

Spatial patterns in genetic diversity are shaped by individuals dispersing from their parents and larger-scale population movements. It has long been appreciated that these patterns of movement shape the underlying genealogies along the genome leading to geographic patterns of isolation by distance in contemporary population genetic data. However, extracting the enormous amount of information contained in genealogies along recombining sequences has, until recently, not been computationally feasible. Here we capitalize on important recent advances in genome-wide gene-genealogy reconstruction and develop methods to use thousands of trees to estimate per-generation dispersal rates and to locate the genetic ancestors of a sample back through time. We take a likelihood approach in continuous space using a simple approximate model (branching Brownian motion) as our prior distribution of spatial genealogies. After testing our method with simulations we apply it to Arabidopsis thaliana . We estimate a dispersal rate of roughly 60km 2 per generation, slightly higher across latitude than across longitude, potentially reflecting a northward post-glacial expansion. Locating ancestors allows us to visualize major geographic movements, alternative geographic histories, and admixture. Our method highlights the huge amount of information about past dispersal events and population movements contained in genome-wide genealogies.

Article activity feed

  1. Evaluation Summary:

    The relationship between homologous chromosomes sampled in a population can be described by an "ancestral recombination graph" or as a "forest" of correlated coalescent trees describing the relationship at each locus on the chromosome. It has long been clear that this graph contains enormous amounts of information about the history of the population, and should be used in analysis. Hitherto this has been computationally infeasible, but recently developed methods are starting to make it possible, and this paper is one of the first attempts to do so. The paper should be of interest to anyone working with population genetic inference, although there are concerns about possible bias in the estimates from the 1001 Arabidopsis Genomes that need to be resolved.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

  2. Reviewer #1 (Public Review):

    The relationship between homologous chromosomes sampled in a population can be described by an "ancestral recombination graph" or as a "forest" of correlated coalescent trees describing the relationship at each locus on the chromosome. It has long been clear that this graph contains enormous amounts of information about the history of the population, and should be used in analysis. Hitherto this has been computationally infeasible, but recently developed methods are starting to make it possible, and this paper is one of the first attempts to do so.

    The approach is cutting edge, and appears to work in theory. However, some of the inferences from real data, in this case the 1001 Arabidopsis Genomes seem implausible, in particular the conclusion of rapid recent migration - hundreds of kilometers in tens of generations. At the very least, these estimates need to be put in context of It data on local populations being stable for over 100 years, of outcrossing only happening on the order of every 20 generations, and a general pattern of strong population structure. An alternative explanation that the algorithm is producing biased results needs to be excluded.

    If this can be resolved, the paper should be of major interest to anyone interested in population genetics.

  3. Reviewer #2 (Public Review):

    Strengths:

    The method leverages inferred genealogies to make inferences about location and dispersal rates of ancestors. If the inferred genealogies are sufficiently accurate, this approach should be nearly unrivalled in its power to achieve this aim. Furthermore, the ability to locate ancestors, and trace migrations could be of great importance and has the potential to capture histories more accurately, compared to more simplistic approaches that assume discrete events and no explicit spatial models.

    Weaknesses:

    A potential weakness is that the data contains most likely little information about location of ancestors in deeper times and it is unclear how to identify when estimates become unreliable. Relatedly, there is a potential challenge in interpreting inferred ancestral locations - in particular, when shifts (or lack of them) could be caused by sampling biases or underrepresentation of certain ancestries. Both points are sufficiently caveated in the paper, but could provide challenges when applying this approach in practise.

    Impact and utility:

    The method was applied to A. thaliana, but should readily be applicable to any recombining species and in particular to human data. There, we have extensive data of ancient human groups, which may benefit this approach and could reveal important insights into ancestral migrations of human groups. Quantifying evidence of migrations beyond qualitative measures (e.g., of gene-flow) is important and has the potential to capture more subtle signals, including separation by distance or continued movements through time.

    Overall, I believe that the authors are presenting a good method, which will be difficult to improve upon and can be applied to a wide range of problems. I am convinced about most of their results and conclusions. The overestimation of dispersal rates in simulations is interesting and could be investigated further. The inferred increase in recent dispersal rates of A. thaliana could potentially be (in part) an artefact of excluding rare variants in the data and should be checked by the authors.

  4. Reviewer #3 (Public Review):

    This paper presents a method to estimate the spatial locations of ancestors based on inferred genetic ancestry from recombining species. Given an estimated ancestry (in the form of a sequence of time-resolved trees along the genome), the method can estimate dispersal rates through time as well as the locations of genetic ancestors, based on a Branching Brownian motion model of spatial dispersal. The inference method performs well on data produced by detailed forwards-time simulations, both from the true simulated trees as well as trees inferred by Relate.

    The authors apply the inference method to a large Arabidopsis thaliana data set, first estimating trees using Relate, and then applying their inference methods to the trees. (The estimated trees have been made freely available on Zenodo, which will be a valuable community resource.) They detect very high dispersal rates in the recent past (especially East-West), and show many interesting visualisations of population structure over time.

    This paper is important not just for the method it introduces and the inferences made, but because it showcases what is possible given the newly available estimates of genetic ancestry. Population genetics is today mostly concerned with extant individuals, and the effects of historical processes on their genomes. Now that we have estimated trees we can begin to ask questions directly about genetic ancestors as well as samples. This paper helps to answer one fundamental question (where did the ancestors live?), but there are many more, and the methodology developed here will help shape those questions.