On the Comparison of LGT networks and Tree-based Networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Phylogenetic networks are widespread representations of evolutionary histories for taxa that undergo hybridization or Lateral-Gene Transfer (LGT) events. There are now many tools to reconstruct such networks, but no clearly established metric to compare them, unlike trees for which the Robinson-Foulds distance has become a standard. However, to assess the quality of reconstruction methods, one needs a way to compare networks quantitatively, for example, to evaluate predictions against a simulated ground truth. Despite years of effort in developing metrics, known dissimilarity measures are either incapable of distinguishing all pairs of different networks, or are extremely difficult to compute. Since it appears challenging, if not impossible, to create the ideal metric for all classes of networks, it may be relevant to design them for specialized applications.
In this article, we introduce a metric on LGT networks. These consist of trees with additional arcs that represent transfer events and are useful for scenarios involving genetic exchanges between co-existing species. Our metric is based on edit operations, namely the addition/removal of transfer arcs, and the contraction/expansion of arcs of the base tree, allowing it to connect the space of all LGT networks. We study its computational complexity and show that it is linear-time computable if the order of transfers along a branch is unconstrained but NP-hard otherwise, in which case we provide a fixed-parameter tractable (FPT) algorithm. We implemented our algorithms and demonstrate their applicability on three numerical experiments. The first one shows the scalability of the computation of the metric on random simulated networks. The other two are proof-of-concepts of concrete applications, respectively in the context of the evaluation of predicted “transfer highways” on bacterial data, and to the tuning of cost values in gene-tree reconciliations.