Biological causes and impacts of rugged tree landscapes in phylodynamic inference

Jiansi Gao
Marius Brusselmans
Luiz M. Carvalho
Marc A. Suchard
Guy Baele
Frederick A. Matsen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Phylodynamic analysis has been instrumental in elucidating epidemiological and evolutionary dynamics of pathogens. Bayesian phylodynamics integrates out phylogenetic uncertainty, which is typically substantial in phylodynamic datasets due to limited genetic diversity. Phylodynamic inference does not, however, scale with modern datasets, partly due to difficulties in traversing tree space. Here, we characterize tree space and landscape in phylodynamic inference and assess its impacts on analysis difficulty and key biological estimates. By running extensive Bayesian analyses of 15 classic large phylodynamic datasets and carefully analyzing the posterior samples, we find that the posterior tree landscape is diffuse yet rugged, leading to widespread tree sampling problems that usually stem from sequences in a small part of the tree. We develop clade-specific diagnostics to show that a few sequences—including putative recombinants and recurrent mutants—frequently drive the ruggedness and sampling problems, although existing data-quality tests show limited power to detect them. The sampling problems can significantly impact phylodynamic inferences or distort major biological conclusions; the impact is usually stronger on “local” estimates ( e.g ., introduction history) associated with particular clades than on “global” parameters ( e.g ., demographic trajectory) governed by general tree shape. We evaluate existing and newly-developed MCMC diagnostics, and offer strategies for optimizing phylodynamic analysis settings and mitigating sampling problem impacts. Our findings highlight the need and directions to develop efficient traversal over rugged tree landscapes, ultimately advancing scalable and reliable phylodynamics.

Bayesian phylodynamics is central to epidemiological studies, but exploring the vast and complex tree space is computationally challenging. Phylodynamic datasets comprise many highly similar sequences, sampled through time, creating a uniquely structured landscape of optimal trees. Here, we show that phylodynamic tree landscapes are often highly rugged, with multiple peaks separated by difficult-to-cross valleys. These features lead to widespread sampling problems which are often driven by a few sequences. These problems can significantly impact phylodynamic estimates, especially those associated with particular clades, distorting biological conclusions. We develop diagnostics to identify problematic sequences and provide solutions to mitigate their impacts. We offer strategies to optimize phylodynamic analysis workflows and to develop algorithms for navigating rugged landscapes, thereby advancing infectious disease investigation.

Version published to 10.1101/2025.06.10.657742 on bioRxiv
Jun 12, 2025

Testing the validity and adequacy of linguistic phylogenetic analyses

This article has 1 author:
1. Benedict King
This article has no evaluationsLatest version Dec 17, 2025
The weak driver conundrum: data archiving and biological phenomena impact macrogenetic findings

This article has 2 authors:
1. Ivo Colmonero-Costeira
2. Deborah Leigh
This article has no evaluationsLatest version Dec 10, 2025
Somatic and germline mutational processes across the tree of life

This article has 11 authors:
1. Peter Campbell
2. Sangjin Lee
3. Yichen Wang
4. Heaton Haynes
5. Emily Mitchell
6. Mark Maddison
7. Liam Crowley
8. Patrick Adkins
9. Nova Mieszkowska
10. Mark Blaxter
11. Richard Durbin
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Testing the validity and adequacy of linguistic phylogenetic analyses

The weak driver conundrum: data archiving and biological phenomena impact macrogenetic findings

Somatic and germline mutational processes across the tree of life