Identification of transmission clusters with high influence on shared parameter estimates in Bayesian phylodynamics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Estimating a shared rate of transmission from multiple clusters of pathogen genome sequences has become increasingly common. Such inferences are appealing, as they allow the overall characterisation of a polyphyletic population and the inclusion of clusters that could not easily be analysed by themselves, for example due to a very small number of sequence samples. Differences in sample size, however, can result in substantial variations in the information content each cluster carries about the rate of transmission. When the clusters additionally differ in their respective transmission rates, the shared estimate would not represent all clusters equally well. It is therefore integral to assess the influence of each cluster on the estimate. Here, we focus on Bayesian phylodynamic inference which combines an evolutionary with an epidemiological model to infer transmission rates from clusters of genetic sequences. We build on results about case influence in Bayesian models to suggest a computationally inexpensive measure that quantifies influence in the context of inference from multiple conditionally independent clusters, regardless of the exact epidemiological model. We further analyse clusters simulated under various birth-death-sampling models to evaluate the performance of the method and explore which properties can generally be drivers of high influence. We find the influence to be a complex interplay of size, persistence and rate relative to all included clusters. We further highlight that the shared estimate from clusters of different rates does not correspond to a population mean or median, but rather to a certainty-weighted average. Finally, we demonstrate the practical insights that can be gained through influence assessment in the analysis of SARS-CoV-2 genomes, sampled in Germany before the implementation of the first nation-wide lockdown in early 2020. Our results illustrate how variable the influence of individual clusters can be and how quantifying it can guide further studies, for example into transmission rate differences.

Article activity feed