Moments of the length of sample genealogy and their applications ☆

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Underlying many important statistical properties of a sample of DNA sequences is the total branch length L of the sample’s genealogy. A classic example is the number K of mutations on the genealogy, whose expectation and variance are simple functions of the expectation and variance of L . However, higher moments of L and related quantities have received relatively little attention despite their potential utility. This paper systematically investigates the properties of L , its relationship with the total branch length of a certain number of descendants, and demonstrates their usefulness through several applications under the constant-in-state model, which is an extension of the Wright–Fisher model with constant effective population size. Specifically, the recurrence relations for the power moments of L as well as for the expectations of products between powers of L and branch length of specified sizes are derived. A closed-form expression for the power moments of L is also obtained. These results allow examination of the large-sample behavior of L through its skewness and kurtosis, revealing that L does not satisfy asymptotic normality under the constant-size Wright–Fisher model, although approximate normality emerges in rapidly growing populations. Moreover, the power moments of L provide a more straightforward route to deriving higher moments of K and yield a novel approach for computing the distribution of K . The associated mixed moments similarly lead to a novel method for calculating the probability of having a single mutation of a specified number of descendants.

Article activity feed