Low accuracy of complex admixture graph inference from f -statistics

Lauren E. Frankel
Cécile Ané

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

F -statistics are commonly used to assess hybridization, admixture or introgression between populations or deeper evolutionary lineages. Using simulations, we find that network complexity had a large impact on the accuracy to infer the network structure from f statistics. Networks recovered accurately had one reticulation, or had their reticulations in “large” cycles of at least 4 nodes in all subnetworks. But accuracy was extremely poor to infer complex networks, in which a reticulation is part of a small cycle of only 3 nodes in some subnetwork. Accuracy also decreased with increasing number of reticulations and the network level. For these networks, accuracy was low even from large data sets with low mutation rate, under a molecular clock, and retaining many top-scoring graphs. Yet in all cases, the network’s major tree was recovered reliably. When the molecular clock was violated, the f ₄ -test tended to falsely detect the presence of reticulation in large data sets or under a high mutation rate. Rate variation also impacted network inference accuracy and increased the rate of falsely rejecting 1 reticulation as being adequate. We propose that identifiability, or lack thereof, is underlying the contrasting recoverability between simple and complex networks. Our findings suggest that the major tree is one feature that might be estimable from f -statistics. In practice, we recommend evaluating a large set of top-scoring networks inferred from f -statistics, and even so, using caution in assuming that the true network is part of this set. The extent of rate variation should be assessed in the system under study, especially at deeper time scales, or when using fast-evolving loci.

Version published to 10.1101/2025.03.07.642126 on bioRxiv
Mar 12, 2025

The weak driver conundrum: data archiving and biological phenomena impact macrogenetic findings

This article has 2 authors:
1. Ivo Colmonero-Costeira
2. Deborah Leigh
This article has no evaluationsLatest version Dec 10, 2025
Optimal Inference of Asynchronous Boolean Network Models

This article has 1 author:
1. Guy Karlebach
This article has no evaluationsLatest version Dec 19, 2025
Efficient Gillespie algorithms for spreading phenomena in large and heterogeneous higher-order networks

This article has 4 authors:
1. Silvio Ferreira
2. Hugo Maia
3. Wesley Cota
4. Yamir Moreno
This article has no evaluationsLatest version Dec 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The weak driver conundrum: data archiving and biological phenomena impact macrogenetic findings

Optimal Inference of Asynchronous Boolean Network Models

Efficient Gillespie algorithms for spreading phenomena in large and heterogeneous higher-order networks