Fast and accurate tree of blobs reconstruction under the network multispecies coalescent

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gene flow between species or populations is an important force in evolution, modeled by the network multispecies coalescent. Phylogenetic network reconstruction under this model is notoriously challenging, with the leading methods scaling to just tens of species. Divide-and-conquer is a promising path forward; however, methods with statistical consistency guarantees require the tree of blobs (TOB), which displays only the tree-like parts of the network, to perform subset decomposition. TOB reconstruction is challenging in its own right, with the leading method TINNiK having time complexity O ( n 5 + n 4 k ), where k is the number of input gene trees and n is the number of species. In prior work, we introduced a new framework for TOB reconstruction that operates by (1) seeking a refinement of the TOB and then (2) contracting edges in it. For step (1), we estimate a tree using TREE-QMC, a fast quartet-based species tree method with time complexity O ( n 3 k ). For step (2), we contract edges based on the same hypothesis tests as TINNiK, which are applicable to subsets of four taxa. On simulated data sets, our method, referred to as TOB-QMC, was faster and more accurate than TINNiK.

Here, we provide the theoretical justification for our framework. For step (1), we show that Weighted Quartet Consensus converges to a TOB refinement with high probability, as the number k of gene trees tends to infinity, motivating the use of fast quartet-based methods for species tree estimation such as TREE-QMC or ASTRAL. ASTRAL, in particular, is statistically consistent in this context. For step (2), we show that sampling just O ( n ) 4-taxon subsets around each edge is sufficient to ensure that false positive edges in a TOB refinement are contracted with high probability, as k tends to infinity. Thus, our framework enables statistically consistent TOB estimation and is fast, with asymptotic runtime dominated by tree reconstruction in step (1).

Article activity feed