Low accuracy of complex admixture graph inference from f -statistics

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

F -statistics are commonly used to assess hybridization, admixture or introgression between populations or deeper evolutionary lineages. Their fast calculation from allele frequencies allows for rapid downstream admixture graph inference. One frequently overlooked assumption of the f 4 -test is a constant substitution rate. This assumption is typically questionable when comparing distantly-related lineages. Using simulations we find that rate variation across lineages decreases the accuracy of the f 4 -test to detect the presence of reticulations in large data sets or with high average mutation rate. But when f -statistics are combined to infer an admixture graph, rate variation across lineages has a small effect on accuracy. Network inference was accurate on a simple network with 1 reticulation only, but extremely inaccurate to infer a complex network with 4 reticulations, even from large data sets and without rate variation. Yet in both cases, the network’s major tree was inferred reliably. Rate variation significantly increased the distance between the true and closest estimated network, the score gap between the true and best-scoring network, and the rate of incorrectly rejecting 1 reticulation as adequate, under our simple network. We propose that identifiability, or lack thereof is underlying the contrasting results between our simple and complex networks. Our findings suggest that the major tree is one feature that might be identifiable from f -statistics. In practice, we recommend evaluating a large set of top-scoring networks inferred from f -statistics, and even so, using caution in assuming that the true network is part of this set when inferred networks are complex. The extent of rate variation should be assessed in the system under study, especially at deeper time scales, in systems with rapid molecular evolution or with fast-evolving loci.

Article activity feed