DIST: Distance-based Inference of Species Trees

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Inferring species trees from concatenated loci is often criticised for failing to account for gene tree discordance – particularly when using character-based methods. However, this criticism does not apply to distance-based concatenation trees, which can be shown to be statistically consistent even in anomaly zones. Building on this insight, we introduce DIST (Distance-based Inference of Species Trees), an intuitive and scalable method that infers species trees from population-level distance matrices containing multi-locus estimates of D xy , F ST or coalescence units ( τ ). DIST derives these values from between-individual sequence dissimilarity estimates, E(p) , using basic equations from coalescence theory. Under certain conditions, DIST can also quantify gene tree discordance and distinguish whether it arises from gene flow or incomplete lineage sorting alone. While conceptually related to more sophisticated summary methods, DIST differs in that it does not seek the species tree which best explains a set of gene trees. Instead, it searches for the species tree which best explains an average gene tree, of which all branch lengths reflect mean coalescence time, E(t) . Although this average gene tree is rarely observed empirically, it is approximated by an individual-level distance-based tree, traditionally referred to as a ‘tree of individuals’. The DIST algorithm is implemented in the R package SambaR , which now accepts input in the form of pairwise E ( p) estimates.

Article activity feed