Probabilities of tree topologies with temporal constraints and diversification shifts
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Peer Community in Evolutionary Biology)
Abstract
Dating the tree of life is a task far more complicated than only determining the evolutionary relationships between species. It is therefore of interest to develop approaches apt to deal with undated phylogenetic trees. The main result of this work is a method to compute probabilities of undated phylogenetic trees under Markovian diversification models by constraining some of the divergence times to belong to given time intervals and by allowing diversification shifts on certain clades. If the diversification models considered are lineage-homogeneous, the time complexity of this computation is quadratic with the number of species of the phylogenetic tree and linear with the number of temporal constraints. The interest of this computation method is illustrated with three applications, namely, to compute the distribution of the divergence times of a tree topology with temporal constraints, to directly sample the divergence times of a tree topology, and to test for a diversification shift at a given clade.
Article activity feed
-
-
Phylogenetic trees can be used to extract information about the process of diversification that has generated them. The most common approach to conduct this inference is to rely on a likelihood, defined here as the probability of generating a dated tree T given a diversification model (e.g. a birth-death model), and then use standard maximum likelihood. This idea has been explored extensively in the context of the so-called diversification studies, with many variants for the models and for the questions being asked (diversification rates shifting at certain time points or in the ancestors of particular subclades, trait-dependent diversification rates, etc).
However, all this assumes that the dated tree T is known without error. In practice, trees (that is, both the tree topology and the divergence times) are inferred based on DNA …Phylogenetic trees can be used to extract information about the process of diversification that has generated them. The most common approach to conduct this inference is to rely on a likelihood, defined here as the probability of generating a dated tree T given a diversification model (e.g. a birth-death model), and then use standard maximum likelihood. This idea has been explored extensively in the context of the so-called diversification studies, with many variants for the models and for the questions being asked (diversification rates shifting at certain time points or in the ancestors of particular subclades, trait-dependent diversification rates, etc).
However, all this assumes that the dated tree T is known without error. In practice, trees (that is, both the tree topology and the divergence times) are inferred based on DNA sequences, possibly combined with fossil information for calibrating and informing the divergence times. Molecular dating is a delicate exercise, however, and much more so in fact than reconstructing the tree topology. In particular, a mis-specificied model for the relaxed molecular clock, or a mis-specifiied prior, can have a substantial impact on the estimation of divergence dates - which in turn could severely mislead the inference about the underlying diversification process. This thus raises the following question: would that be possible to conduct inference and testing of diversification models without having to go through the dangerous step of molecular dating?
In his article ""Probabilities of tree topologies with temporal constraints and diversification shifts"" [1], Gilles Didier introduces a recursive method for computing the probability of a tree topology under some diversification model of interest, without knowledge of the exact dates, but only interval constraints on the dates of some of the nodes of the tree. Such interval constraints, which are derived from fossil knowledge, are typically used for molecular dating: they provide the calibrations for the relaxed clock analysis. Thus, what is essentially proposed by Gilles Didier is to use them in combination with the tree topology only, thus bypassing the need to estimates divergence times first, before fitting a diversification model to a phylogenetic tree.
This article, which is primarily a mathematical and algorithmic contribution, is then complemented with several applications: testing for a diversification shift in a given subclade of the phylogeny, just based on the (undated) tree topology, with interval constraints on some of its internal nodes; but also, computing the age distribution of each node and sampling on the joint distribution on node ages, conditional on the interval constraints. The test for the presence of a diversification shift is particularly interesting: an application to simulated data (and without any interval constraint in that case) suggests that the method based on the undated tree performs about as well as the classical method based on a dated tree, and this, even granting the classical approach a perfect knowledge of the dates - given that, in practice, one in fact relies on potentially biased estimates. Finally, an application to a well-known example (rate shifts in cetacean phylogeny) is presented.
This article thus represents a particularly meaningful contribution to the methodology for diversification studies; but also, for molecular dating itself: it is a well known problem in molecular dating that computing and sampling from the conditional distributions on node ages, given fossil constraints, and more generally understanding and visualizing how interval constraints on some nodes of the tree impact the distribution at other nodes, is a particularly difficult exercise. For that reason, the algorithmic routines presented in the present article will be useful in this context as well.References
[1] Didier, G. (2020) Probabilities of tree topologies with temporal constraints and diversification shifts. bioRxiv, 376756, ver. 4 peer-reviewed and recommended by PCI Evolutionary Biology. doi: 10.1101/376756
-
