Lower Bounds on the Sample Complexity of Species Tree Estimation when Substitution Rates Vary Across Loci

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In this paper we analyze the effect of substitution rate heterogenity on the sample complexity of species tree estimation. We consider a model based on the multi-species coalescent (MSC), with the addition that gene trees exhibit random i.i.d. rates of substitution. Our first result is a lower bound on the number of loci needed to distinguish 2-leaf trees (i.e., pairwise distances) with high probability, when substitution rates satisfy a growth condition. In particular, we show that to distinguish two distances differing by length f with high probability, one requires O ( f −2 ) loci, a significantly higher bound than the constant rate case. The second main result is a lower bound on the amount of data needed to reconstruct a 3-leaf species tree with high probability, when mutation rates are gamma distributed. In this case as well, we show that the number of gene trees must grow as O ( f −2 ).

Article activity feed