Efficient Phylogenetic Inference Using SNP-Based Approaches: A Comparison with Full Sequence Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mutations in specific genomic regions or genes serve as reliable indicators of phylogenetic relationships, with single nucleotide polymorphisms (SNPs) playing a crucial role in population phylogenetic studies. Traditional distance-based phylogenetic algorithms have a time complexity proportional to l n 2 , where n is the number of sequences and l is their length [1], [2]. This high computational cost becomes a bottleneck in phylogeny reconstruction, particularly when l > n.

To overcome this limitation, we propose an SNP-based approach to phylogenetic tree inference, focusing exclusively on variant (mutated) positions rather than entire sequences. This method significantly reduces computational time while maintaining accuracy. We compare phylogenies inferred from SNP data and full sequence data (including both SNPs and invariant sites) across multiple metrics.

Our results show that heuristic phylogenetic trees constructed from SNPs achieve parsimony scores nearly identical to those derived from full sequence data, as parsimony primarily depends on variant positions. Additionally, under the Jukes-Cantor 1969 (JC69) model, log-likelihood scores for SNP-based and full-sequence-based trees exhibit a strong correlation when evaluated using the same tree topology, branch lengths, and maximum likelihood parameters. These findings demonstrate that SNP-based methods can streamline phylogenetic analysis while preserving accuracy.

Article activity feed