Coalescent-based branch length estimation improves dating of species trees
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Species trees need to be dated for many downstream applications. Typical molecular dating methods take a phylogenetic tree with branch lengths in substitution units as well as a set of calibrations as input and convert the branch lengths of the species tree to the unit of time while being consistent with the pre-specified calibrations. When dating species trees from multi-locus genome-scale datasets, the branch lengths and sometimes the topology of the species tree are estimated using concatenation. However, concatenation does not address gene tree heterogeneity across the genome. While Bayesian dating methods can address some forms of gene tree heterogeneity, such as incomplete lineage sorting, they are not scalable to large datasets. In this paper, we introduce a new scalable pipeline for dating species trees that addresses gene tree discordance for both topology and branch length estimation. The pipeline uses discordance-aware methods that account for incomplete lineage sorting for estimating the topology and branch lengths and maximum likelihood-based methods for the dating step. Our simulation study on datasets with gene tree discordance shows that this pipeline produces more accurate and less biased dates than pipelines that use concatenation or unpartitioned Bayesian methods. Furthermore, it is substantially more scalable and can handle datasets with thousands of species and genes. Our results on two biological datasets show that this new pipeline improves the inference of node ages and branch lengths for some nodes, in particular extant taxa, and improves the downstream diversification analysis.