Moments of the length of sample genealogy and their applications ☆
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Underlying many important statistical properties of a sample of DNA sequences is the total branch length L of the sample’s genealogy. A classic example is the number K of mutations on the genealogy, whose expectation and variance are simple functions of the expectation and variance of L . However, higher moments of L and related quantities have received relatively little attention despite their potential utility. This paper systematically investigates the properties of L , its relationship with the total branch length of a certain number of descendants, and demonstrates their usefulness through several applications under the constant-in-state model, which is an extension of the Wright–Fisher model with constant effective population size. Specifically, the recurrence relations for the power moments of L as well as for the expectations of products between powers of L and branch length of specified sizes are derived. A closed-form expression for the power moments of L is also obtained. These results allow examination of the large-sample behavior of L through its skewness and kurtosis, revealing that L does not satisfy asymptotic normality under the constant-size Wright–Fisher model, although approximate normality emerges in rapidly growing populations. Moreover, the power moments of L provide a more straightforward route to deriving higher moments of K and yield a novel approach for computing the distribution of K . The associated mixed moments similarly lead to a novel method for calculating the probability of having a single mutation of a specified number of descendants.