Detecting Introgression in Shallow Phylogenies: How Minor Molecular Clock Deviations Lead to Major Inference Errors
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent theoretical and algorithmic advances in introgression detection, coupled with the growing availability of genome-scale data, have highlighted the widespread occurrence of interspecific gene flow across the tree of life. However, current methods largely depend on the molecular clock assumption—a questionable premise given empirical evidence of substitution rate variation across lineages. While such rate heterogeneity is known to compromise gene flow detection among divergent lineages, its impact on closely related taxa at shallow evolutionary timescales remains poorly understood, likely because these taxa are often assumed to adhere to a molecular clock. To address this gap, we combine theoretical analyses and simulations to evaluate the robustness of widely used site-pattern methods ( D -statistic and HyDe) to rate variation across phylogenetic timescales. Our results demonstrate that both methods exhibit high sensitivity to even minor deviations from the molecular clock at shallow timescales, complementing previous findings at deeper scales. Specifically, in young phylogenies (with an age of 3×10 5 generations) with small population sizes, weak (17% difference) and moderate (33% difference) rate variation can inflate false-positive rates up to 35% and 100%, respectively, when using site-pattern counts from a 500Mb genome. Employing a more distant outgroup intensifies these spurious signals. Our study demonstrates that summary tests for introgression are pervasively vulnerable to minor rate variations and underscores the critical need for advanced methodologies to disentangle genuine introgression from false signals generated by rate heterogeneity.