Biological causes and impacts of rugged tree landscapes in phylodynamic inference
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Phylodynamic analysis has been instrumental in elucidating the epidemiological and evolutionary dynamics of pathogens. The Bayesian approach to phylodynamics integrates out phylogenetic uncertainty, which is typically substantial in phylodynamic datasets due to low genetic diversity. Bayesian phylodynamic analysis does not, however, scale with modern datasets, partly due to difficulties in traversing tree space. Here, we set out to characterize tree space of phylodynamic inference and assess its impacts on analysis difficulty and key biological estimates. By running extensive Bayesian analyses of 15 classic large phylodynamic datasets and carefully analyzing the posterior samples, we find that the posterior landscape in tree space (“tree landscape”) is diffuse yet rugged, leading to widespread tree sampling problems that usually stem from the sequences in a small part of the tree. We develop clade-specific diagnostics to show that a few sequences— including putative recombinants and recurrent mutants—frequently drive tree space ruggedness and sampling problems, although existing data-quality tests show limited power to detect such sequences. The sampling problems can significantly impact phylodynamic inferences or even distort major biological conclusions; the impact is usually stronger on “local” estimates ( e.g ., introduction history) that are associated with particular clades than on “global” parameters ( e.g ., demographic trajectory) that are governed by the general tree shape. We evaluate existing and newly-developed MCMC diagnostics, and offer strategies for optimizing MCMC settings and mitigating impacts of the sampling problems. Our findings highlight the need for and directions to develop efficient traversal over the rugged tree landscape, ultimately advancing scalable and reliable phylodynamics.
Bayesian phylodynamics is central to epidemiological studies but computa tionally challenging. One major chal lenge is posed by the need to explore a vast and complex tree space. Phy lodynamic datasets usually comprise many sequences with limited genetic diversity, distinct from traditional phy logenetic datasets, warranting charac terization of phylodynamic tree space and inference performance. Here, we demonstrate that: 1) phylodynamic tree landscape is highly rugged, lead ing to widespread tree sampling prob lems that are frequently driven by a few sequences, and; 2) these problems can distort biological conclusions. We develop new diagnostics to identify the problematic sequences and highlight potential solutions to mitigate their im pacts. We offer strategies to optimize phylodynamic analysis workflows and to develop algorithms for navigating tree space ruggedness, thereby facili tating infectious disease investigation.