The tree labeling polytope: a unified approach to ancestral reconstruction problems

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Reconstructing unobserved ancestral states of a phylogenetic tree provides insight into the history of evolving systems and is one of the fundamental problems in phylogenetics. For a fixed phylogenetic tree, the most parsimonious ancestral reconstruction – a solution to the small parsimony problem – can be efficiently found using the dynamic programming algorithms of Fitch-Hartigan and Sankoff. Ancestral reconstruction is important in many applications including inferring the routes of metastases in cancer, deriving the transmission history of viruses, determining the direction of cellular differentiation in organismal development, and detecting recombination and horizontal gene transfer in phylogenetic networks. However, most of these applications impose additional global constraints on the reconstructed ancestral states, which break the local structure required in the recurrences of Fitch-Hartigan and Sankoff.

Results

We introduce an alternative, polyhedral approach to ancestral reconstruction problems using the tree labeling polytope , a geometric object whose vertices represent the feasible ancestral labelings of a tree. This framework yields a polynomial-time linear programming algorithm for the small parsimony problem . More importantly, the tree labeling polytope facilitates the incorporation of additional constraints that arise in modern ancestral reconstruction problems. We demonstrate the utility of our approach by deriving mixed-integer programming algorithms with a small number of integer variables and strong linear relaxations for three such problems: the parsimonious migration history problem, the softwired small parsimony problem on phylogenetic networks, and the convex recoloring problem on trees. Our algorithms outperform existing state-of-the-art methods on both simulated and real datasets. For instance, our algorithm scales to trace routes of cancer metastases in trees with thousands of leaves, enabling the analysis of large trees generated by recent single-cell sequencing technologies. On a mouse model of metastatic lung adenocarcinoma, the tree labeling polytope allows us to infer simpler migration histories compared to previous results.

Availability

Python implementations of the algorithms provided in this work are available at: github.com/raphael-group/tree-labeling-polytope .

Article activity feed