Identification of transmission clusters with high influence on shared parameter estimates in Bayesian phylodynamics

Ariane Weber
Ruth Boersma
Sebastian Duchene

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Estimating a shared rate of transmission from multiple clusters of pathogen genome se-quences has become increasingly common. Such inferences are appealing, as they allow the overall characterisation of a polyphyletic population and the inclusion of clusters that could not easily be analysed by themselves, for example due to a very small number of sequence samples. Differences in sample size, however, can result in substantial variations in the information content each cluster carries about the rate of transmission. When the clusters additionally differ in their respective transmission rates, the shared estimate would not represent all clusters equally well. It is therefore integral to assess the influence of each cluster on the estimate. Here, we focus on Bayesian phylodynamic inference which combines an evolutionary with an epidemiological model to infer transmission rates from clusters of genetic sequences. We build on results about case influence in Bayesian models to suggest a computationally inexpensive measure that quantifies influence in the context of inference from multiple conditionally independent clusters, regardless of the exact epidemiological model. We further analyse clusters simulated under various birth-death-sampling models to evaluate the performance of the method and explore which properties can generally be drivers of high influence. We find the influence to be a complex interplay of size, persistence and rate relative to all included clusters. We further highlight that the shared estimate from clusters of different rates does not correspond to a population mean or median, but rather to a certainty-weighted average. Finally, we demonstrate the practical insights that can be gained through influence assessment in the analysis of SARS-CoV-2 genomes, sampled in Germany before the implementation of the first nation-wide lockdown in early 2020. Our results illustrate how variable the influence of individual clusters can be and how quantifying it can guide further studies, for example into transmission rate differences.

Version published to 10.1101/2025.11.18.25340546 on medRxiv
Nov 19, 2025

Stronger Evidence for Trait–Environment Association by Pre-processing of Abundance Tables

This article has 1 author:
1. Cajo ter Braak
This article has no evaluationsLatest version Feb 25, 2026
Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing

This article has 8 authors:
1. Annika Freudiger
2. Natalie Kestel
3. Vladimir Jovanovic
4. Mariana Madruga de Brito
5. Angelina Ruiz-Lambides
6. Katja Nowick
7. Anja Widdig
8. Harald Ringbauer
This article has no evaluationsLatest version Jan 23, 2026
Impact of scale parameter for marker variance prior in some Bayesian whole-genome regression methods

This article has 2 authors:
1. Özge KOZAKLI
2. Ayhan CEYHAN
This article has no evaluationsLatest version Jan 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Stronger Evidence for Trait–Environment Association by Pre-processing of Abundance Tables

Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing

Impact of scale parameter for marker variance prior in some Bayesian whole-genome regression methods