A method for massively scalable inference of phylogenetic networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recent advancements in sequencing technologies have enabled large-scale phylogenomic analyses. While these analyses often rely on phylogenetic trees, increasing evidence suggests that non-treelike evolutionary events, such as hybridization and horizontal gene transfer, are prevalent in the evolutionary histories of many species, in which case tree-based models are insufficient. Phylogenetic networks can capture such complex evolutionary histories, but current methods for accurately inferring them lack scalability. Implicit network inference methods are fast but lack biological interpretability. Here, we introduce a novel method called InPhyNet that merges a set of non-overlapping, independently inferred level-1 networks into a unified topology, achieving linear scalability while maintaining high accuracy under the multispecies network coalescent model. We prove that a pipeline utilizing InPhyNet can be statistically consistent if the proper methodology is used. Using simulation, we infer networks with up to 200 taxa and show that divide-and-conquer pipelines utilizing InPhyNet allow for accurate network inference at scales and speeds previously unseen. Re-analyzing a phylogeny of 1,158 land plants with InPhyNet, we recover known reticulate events and illustrate how InPhyNet enables large-scale analyses of biologically meaningful reticulate phylogenies at previously unprecedented scales.

Article activity feed