wQFM-TREE: highly accurate and scalable quartet-based species tree inference from gene trees
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Summary methods are becoming increasingly popular for species tree estimation from multi-locus data in the presence of gene tree discordance. Accurate Species TRee Algorithm (ASTRAL), a leading method in this class, solves the Maximum Quartet Support Species Tree problem within a constrained solution space, while heuristics like Weighted Quartet Fiduccia–Mattheyses (wQFM) and Weighted Quartet MaxCut (wQMC) use weighted quartets and a divide-and-conquer strategy. Recent studies showed wQFM to be more accurate than ASTRAL and wQMC, though its scalability is hindered by the computational demands of explicitly generating and weighting Θ(n4) quartets. Here, we introduce wQFM-TREE, a novel summary method that enhances wQFM by avoiding explicit quartet generation and weighting, enabling its application to large datasets.
Results
Extensive simulations under diverse and challenging model conditions, with hundreds or thousands of taxa and genes, consistently demonstrate that wQFM-TREE matches or improves upon the accuracy of ASTRAL. It outperformed ASTRAL in 25 of 27 model conditions (statistically significant in 20) involving 200–1000 taxa. Moreover, applying wQFM-TREE to re-analyze the green plant dataset from the One Thousand Plant Transcriptomes Initiative produced a tree highly congruent with established evolutionary relationships of plants. wQFM-TREE’s remarkable accuracy and scalability make it a strong competitor to leading methods. Its algorithmic and combinatorial innovations also enhance quartet-based computations, advancing phylogenetic estimation.
Availability and implementation
wQFM-TREE is freely available in open source form at https://github.com/abdur-rafi/wQFM-TREE.