Simultaneous Inference of Past Demography and Selection from the Ancestral Recombination Graph under the Beta Coalescent
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Peer Community in Evolutionary Biology)
- Castedo's selected articles (CastedoEllerman)
Abstract
The reproductive mechanism of a species is a key driver of genome evolution. The standard Wright-Fisher model for the reproduction of individuals in a population assumes that each individual produces a number of offspring negligible compared to the total population size. Yet many species of plants, invertebrates, prokaryotes or fish exhibit neutrally skewed offspring distribution or strong selection events yielding few individuals to produce a number of offspring of up to the same magnitude as the population size. As a result, the genealogy of a sample is characterized by multiple individuals (more than two) coalescing simultaneously to the same common ancestor. The current methods developed to detect such multiple merger events do not account for complex demographic scenarios or recombination, and require large sample sizes. We tackle these limitations by developing two novel and different approaches to infer multiple merger events from sequence data or the ancestral recombination graph (ARG): a sequentially Markovian coalescent (SMβC) and a graph neural network (GNN coal ). We first give proof of the accuracy of our methods to estimate the multiple merger parameter and past demographic history using simulated data under the β-coalescent model. Secondly, we show that our approaches can also recover the effect of positive selective sweeps along the genome. Finally, we are able to distinguish skewed offspring distribution from selection while simultaneously inferring the past variation of population size. Our findings stress the aptitude of neural networks to leverage information from the ARG for inference but also the urgent need for more accurate ARG inference approaches.
Article activity feed
-
-
Modelling the evolution of complete genome sequences in populations requires accounting for the recombination process, as a single tree can no longer describe the underlying genealogy. The sequentially Markov coalescent (SMC, McVean and Cardin 2005; Marjoram and Wall 2006) approximates the standard coalescent with recombination process and permits estimating population genetic parameters (e.g., population sizes, recombination rates) using population genomic datasets. As such datasets become available for an increasing number of species, more fine-tuned models are needed to encompass the diversity of life cycles of organisms beyond the model species on which most methods have been benchmarked.
The work by Korfmann et al. (Korfmann et al. 2024) represents a significant step forward as it accounts for multiple mergers in SMC models. …
Modelling the evolution of complete genome sequences in populations requires accounting for the recombination process, as a single tree can no longer describe the underlying genealogy. The sequentially Markov coalescent (SMC, McVean and Cardin 2005; Marjoram and Wall 2006) approximates the standard coalescent with recombination process and permits estimating population genetic parameters (e.g., population sizes, recombination rates) using population genomic datasets. As such datasets become available for an increasing number of species, more fine-tuned models are needed to encompass the diversity of life cycles of organisms beyond the model species on which most methods have been benchmarked.
The work by Korfmann et al. (Korfmann et al. 2024) represents a significant step forward as it accounts for multiple mergers in SMC models. Multiple merger models account for simultaneous coalescence events so that more than two lineages find a common ancestor in a given generation. This feature is not allowed in standard coalescent models and may result from selection or skewed offspring distributions, conditions likely met by a broad range of species, particularly microbial.
Yet, this work goes beyond extending the SMC, as it introduces several methodological innovations. The "classical" SMC-based inference approaches rely on hidden Markov models to compute the likelihood of the data while efficiently integrating over the possible ancestral recombination graphs (ARG). Following other recent works (e.g. Gattepaille et al. 2016), Korfmann et al. propose to separate the ARG inference from model parameter estimation under maximum likelihood (ML). They introduce a procedure where the ARG is first reconstructed from the data and then taken as input in the model fitting step. While this approach does not permit accounting for the uncertainty in the ARG reconstruction (which is typically large), it potentially allows for the extraction of more information from the ARG, such as the occurrence of multiple merging events. Going away from maximum likelihood inference, the authors trained a graph neural network (GNN) on simulated ARGs, introducing a new, flexible way to estimate population genomic parameters.
The authors used simulations under a beta-coalescent model with diverse demographic scenarios and showed that the ML and GNN approaches introduced can reliably recover the simulated parameter values. They further show that when the true ARG is given as input, the GNN outperforms the ML approach, demonstrating its promising power as ARG reconstruction methods improve. In particular, they showed that trained GNNs can disentangle the effects of selective sweeps and skewed offspring distributions while inferring past population size changes.
This work paves the way for new, exciting applications, though many questions must be answered. How frequent are multiple mergers? As the authors showed that these events "erase" the record of past demographic events, how many genomes are needed to conduct reliable inference, and can the methods computationally cope with the resulting (potentially large) amounts of required data? This is particularly intriguing as micro-organisms, prone to strong selection and skewed offspring distributions, also tend to carry smaller genomes.
References
Gattepaille L, Günther T, Jakobsson M. 2016. Inferring Past Effective Population Size from Distributions of Coalescent Times. Genetics 204:1191-1206.
https://doi.org/10.1534/genetics.115.185058
Korfmann K, Sellinger T, Freund F, Fumagalli M, Tellier A. 2024. Simultaneous Inference of Past Demography and Selection from the Ancestral Recombination Graph under the Beta Coalescent. bioRxiv, 2022.09.28.508873. ver. 5 peer-reviewed and recommended by Peer Community in Evolutionary Biology. https://doi.org/10.1101/2022.09.28.508873
Marjoram P, Wall JD. 2006. Fast "coalescent" simulation. BMC Genet. 7:16.
https://doi.org/10.1186/1471-2156-7-16
McVean GAT, Cardin NJ. 2005. Approximating the coalescent with recombination. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 360:1387-1393.
https://doi.org/10.1098/rstb.2005.1673 -
