wQFM-GDL Enables Accurate Quartet-based Genome-scale Species Tree Inference Under Gene Duplication and Loss

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Species tree estimation from genome-wide data has transformed evolutionary studies, particularly in the presence of gene tree discordance. Gene trees often differ from species trees due to factors like incomplete lineage sorting (ILS) and gene duplication and loss (GDL). Quartet-based species tree estimation methods have gained substantial popularity for their accuracy and statistical guarantee. However, most of these methods (e.g., ASTRAL, TREE-QMC, wQFM/wQFM-TREE) rely on single-copy gene trees and models ILS and not GDL, limiting their applicability to large genomic datasets. ASTRAL-Pro, a recent advancement, has refined quartet similarity measures to incorporate both orthology and paralogy, improving species tree inference under GDL. Among other quartet-based methods, wQFM-DISCO converts multicopy gene family trees into single-copy gene trees using DISCO and applies the wQFM algorithm on the single-copy trees. However, ASTRAL-Pro has remained the only quartet-based summary method to explicitly model gene duplication and loss. In this study, we propose wQFM-GDL, extending algorithms wQFM and wQFM-TREE which supports gene family trees and models gene duplication and loss, leveraging the concept of speciation-driven quartets introduced in ASTRAL-Pro. Our algorithm consistently outperforms ASTRAL-Pro3 across most model conditions, offering a promising alternative for estimating species trees in the presence of GDL.

Article activity feed