Advances in Estimating Level-1 Phylogenetic Networks from Unrooted SNPs

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We address the problem of how to estimate a phylogenetic network when given SNPs (i.e., single nucleotide polymorphisms, or bi-allelic markers that have evolved under the infinite sites assumption). We focus on level-1 phylogenetic networks (i.e., networks where the cycles are node-disjoint), since more complex networks are unidentifiable. We provide a polynomial time quartet-based method that we prove correct for reconstructing the unrooted topology of any level-1 phylogenetic network N , if we are given a set of SNPs that covers all the bipartitions of N , even if the ancestral state is not known, provided that the cycles are of length at least 5; we also prove that an algorithm developed by Dan Gusfield in JCSS 2005 correctly recovers the unrooted topology in polynomial time in this case. To the best of our knowledge, this is the first result to establish that the unrooted topology of a level-1 network is uniquely recoverable from SNPs without known ancestral states. We also present a stochastic model for DNA evolution, and we prove that the two methods (our quartet-based method and Gusfield’s method) are statistically consistent estimators of the unrooted topology of the level-1 phylogenetic network. For the case of multi-state homoplasy-free characters, we prove that our quartet-based method correctly constructs the unrooted topology of level-1 networks under the required conditions (all cycles of length at least five), while Gusfield’s algorithm cannot be used in that condition. These results assume that we have access to an oracle for indicating which sites in the DNA alignment are homoplasy-free, and we show that the methods are robust, under some conditions, to oracle errors.

Article activity feed