Efficient Bayesian Phylogenetics under the Infinite Sites Model
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bayesian phylogenetic inference from molecular sequences can provide key insights into the evolutionary history of populations. Existing tools, however, often scale poorly with sample size. We present inPhynite, a highly-efficient Bayesian phylogenetics algorithm for genomic datasets compatible with the infinite sites mutation model. A key advantage of this model is that likelihood calculation, which typically incurs a substantial computational cost, becomes trivial. We show that under the infinite sites assumption, it is possible to sample a coarse space of mutations and coalescences from which we may recover complete phylogenetic trees. We design an efficient Markov chain for this space together with effective population size trajectories, modeled as piecewise constant functions. Based on real and synthetic data, our method significantly outperforms competing methods, offering a speedup of over 225 times in statistical efficiency on large datasets without incurring any loss in accuracy. Finally, we demonstrate how inPhynite can help us understand the evolutionary history and past effective population sizes of human populations based on mitochondrial DNA.
Summary
Inferring the phylogenetic tree and evolutionary parameters from a sample of molecular sequences plays a key role in the study of how populations evolve over time. Existing inference algorithms face major computational challenges due to the large size of the phylogenetic tree space and high cost of phylogenetic likelihood evaluation. We show that under the infinite sites model of mutations, it is possible to overcome these limitations by instead conducting inference over an ordered sequence of genotypes that encodes the essential information in the tree. This approach achieves superior statistical efficiency compared to existing methods under a range of evolutionary conditions.