Inferring drift, genetic differentiation, and admixture graphs from low-depth sequencing data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A number of popular methods for inferring the evolutionary relationship between populations require essentially two components: First, they require estimates of f 2 -statistics, or some quantity that is a linear combination of these. Second, they require estimates of the variability of the statistic in question. Examples of methods in this class include qpGraph and TreeMix.

It is known, however, that these statistics are biased when based on genotype calls at low depth. Moreover, as we show, this leads to downstream inference of significantly distorted trees. To solve this problem, we demonstrate how to accurately and efficiently compute a broad class of statistics from low-depth whole-genome sequencing data, including estimates of their standard errors, by using the site frequency spectrum. In particular, we focus on f 2 and the sample covariance of allele frequencies to show how this method leads to accurate estimate of drift when fitting trees using qpGraph and TreeMix with low-depth data. However, the same considerations lead to uncertainty estimates for a variety of other statistics, including heterozygosity, kinship estimates (e.g. King), and quantities relating to genetic differentiation such as F st and D xy .

Article activity feed