Low accuracy of complex admixture graph inference from f -statistics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

F -statistics are commonly used to assess hybridization, admixture or introgression between populations or deeper evolutionary lineages. Using simulations, we find that network complexity had a large impact on the accuracy to infer the network structure from f statistics. Networks recovered accurately had one reticulation, or had their reticulations in “large” cycles of at least 4 nodes in all subnetworks. But accuracy was extremely poor to infer complex networks, in which a reticulation is part of a small cycle of only 3 nodes in some subnetwork. Accuracy also decreased with increasing number of reticulations and the network level. For these networks, accuracy was low even from large data sets with low mutation rate, under a molecular clock, and retaining many top-scoring graphs. Yet in all cases, the network’s major tree was recovered reliably. When the molecular clock was violated, the f 4 -test tended to falsely detect the presence of reticulation in large data sets or under a high mutation rate. Rate variation also impacted network inference accuracy and increased the rate of falsely rejecting 1 reticulation as being adequate. We propose that identifiability, or lack thereof, is underlying the contrasting recoverability between simple and complex networks. Our findings suggest that the major tree is one feature that might be estimable from f -statistics. In practice, we recommend evaluating a large set of top-scoring networks inferred from f -statistics, and even so, using caution in assuming that the true network is part of this set. The extent of rate variation should be assessed in the system under study, especially at deeper time scales, or when using fast-evolving loci.

Article activity feed