GhostParser: A highly scalable phylogenomic approach for the identification of ghost introgression
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
A growing body of empirical research shows that interspecific gene flow is a widespread biological force that shapes evolutionary histories across the Tree of Life. Computational approaches designed to detect introgression either employ full likelihood, including Bayesian frameworks, to directly estimate phylogenetic networks or utilize summary statistics derived directly from locus sequence alignments or estimated gene trees to map gene flow events onto the species tree. Many current methods currently have major shortcomings. The computationally scalable summary statistics and pseudo-likelihood-based techniques may provide erroneous results in the presence of so-called “ghost” introgression and rate variation between lineages. On the other hand, full likelihood methods are more accurate, but are not computationally tractable for large phylogenomic datasets. Here, we develop a novel summary statistic, based on tree heights of different gene tree topologies to reliably distinguish between sampled and ghost introgression events. We implemented this approach in the publicly accessible bioinformatic pipeline “GhostParser”. We demonstrate that GhostParser can accurately distinguish between scenarios of sampled and ghost introgression, even in the presence of rate variation between lineages. Our methodology concurs in accuracy with the full likelihood software Bayesian Phylogenetics and Phylogeography (BPP) on empirical datasets, and outperforms BPP in our simulation conditions, both in a small fraction of the computational time. We show that GhostParser is a scalable tool for the identification of different introgression patterns in phylogenomic datasets.