A method for massively scalable phylogenetic network inference

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent advancements in sequencing technologies have enabled large-scale phylogenomic analyses. While these analyses often rely on phylogenetic trees, increasing evidence suggests that non-treelike evolutionary events, such as hybridization and horizontal gene transfer, are prevalent in the evolutionary histories of many species, and in such cases, tree-based models are insufficient. Phylogenetic networks can capture such complex evolutionary histories, but current methods for accurately inferring them lack scalability. For instance, state-of-the-art model-based approaches are limited to around 30 taxa. Implicit network inference methods like NeighborNet and Consensus Networks are fast but lack biological interpretability. Here, we introduce a novel method called InPhyNet that merges a set of non-overlapping, independently inferred networks into a unified topology, achieving linear scalability while maintaining high accuracy under the multispecies network coalescent model. Our simulations show that InPhyNet matches the accuracy of SNaQ on datasets with 30 taxa while drastically decreasing the overall network inference time. InPhyNet is also more accurate than implicit network methods on large datasets while maintaining computational feasibility. Re-analyzing a phylogeny of 1,158 land plants with InPhyNet, we recover known reticulate events and provide evidence for the controversial placement of Order Gnetales within gymnosperms. These results demonstrate that InPhyNet enables biologically meaningful network inference at previously unprecedented scales.

The development of sequencing technologies has led to an unprece-dented availability of genomic data, but scalable methods for analyzing such data have not kept up with their demand. Phylogenetic trees can be inferred with thousands of taxa, but model-based phylogenetic network inference is only computationally feasible for a few dozen taxa. Here, we present a novel method for accurate semi-directed phylogenetic network inference that scales linearly with the number of taxa in the input data by merging independently estimated, non-overlapping networks.

Article activity feed