Understanding drivers of phylogenetic clustering and terminal branch lengths distribution in epidemics of Mycobacterium tuberculosis

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This is an interesting simulation-based study focusing on the genomic epidemiology of Mycobacterium tuberculosis. The work nicely relates key biological and epidemiological parameters to how M. tuberculosis isolates cluster together, and to the terminal branch lengths in M. tuberculosis phylogenies. These concepts have both been applied to comparative studies of M. tuberculosis success and have often been interpreted as reflecting differences in transmission. The author finds that clustering and terminal branch lengths can also be modified by differences in the latent period, the mutation rate or the sampling fraction. This work will be of broad interest to readers studying tuberculosis epidemiology and transmission modelling.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article

Abstract

Detecting factors associated with transmission is important to understand disease epidemics, and to design effective public health measures. Clustering and terminal branch lengths (TBL) analyses are commonly applied to genomic data sets of Mycobacterium tuberculosis (MTB) to identify sub-populations with increased transmission. Here, I used a simulation-based approach to investigate what epidemiological processes influence the results of clustering and TBL analyses, and whether differences in transmission can be detected with these methods. I simulated MTB epidemics with different dynamics (latency, infectious period, transmission rate, basic reproductive number R0, sampling proportion, sampling period, and molecular clock), and found that all considered factors, except for the length of the infectious period, affect the results of clustering and TBL distributions. I show that standard interpretations of this type of analyses ignore two main caveats: (1) clustering results and TBL depend on many factors that have nothing to do with transmission, (2) clustering results and TBL do not tell anything about whether the epidemic is stable, growing, or shrinking, unless all the additional parameters that influence these metrics are known, or assumed identical between sub-populations. An important consequence is that the optimal SNP threshold for clustering depends on the epidemiological conditions, and that sub-populations with different epidemiological characteristics should not be analyzed with the same threshold. Finally, these results suggest that different clustering rates and TBL distributions, that are found consistently between different MTB lineages, are probably due to intrinsic bacterial factors, and do not indicate necessarily differences in transmission or evolutionary success.

Article activity feed

  1. Evaluation Summary:

    This is an interesting simulation-based study focusing on the genomic epidemiology of Mycobacterium tuberculosis. The work nicely relates key biological and epidemiological parameters to how M. tuberculosis isolates cluster together, and to the terminal branch lengths in M. tuberculosis phylogenies. These concepts have both been applied to comparative studies of M. tuberculosis success and have often been interpreted as reflecting differences in transmission. The author finds that clustering and terminal branch lengths can also be modified by differences in the latent period, the mutation rate or the sampling fraction. This work will be of broad interest to readers studying tuberculosis epidemiology and transmission modelling.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    This study by Menardo investigates the relationship between key biological and epidemiologic parameters of tuberculosis, including its mutation rate, transmission rate, latent and infectious intervals, and R0), and measures that have been used to investigate differences in strain success: phylogenetic clustering and comparison of terminal branch lengths (TBL). The author simulates tuberculosis epidemics with different biological and epidemiological parameters, simulates M. tuberculosis sequences consistent with that transmission process, and then infers phylogenetic trees, enabling comparison of clustering and TBLs. The main conclusions are that clustering and TBLs can be influenced by parameters other than transmission and that they do not tell us anything about whether an epidemic is stable, growing or shrinking. The latter point is only true if one presumes that the infectious period is unknown. If assumptions can be made about the infectious period for a given strain, location, etc, then TBL would correlate with R0. The strength of this manuscript is that it addresses an important emerging approach for comparison of M. tuberculosis strains and inference about transmission/strain success, illustrating the role that other biological/epidemiological parameters (beyond transmission/R0) can have in shaping these metrics, and in doing so urging caution for overinterpreting these metrics and comparisons.

  3. Reviewer #2 (Public Review):

    The paper "Understanding drivers of phylogenetic clustering and terminal branch length (TBL) distribution in epidemics of Mycobacterium tuberculosis" makes a laudable attempt to assess how various epidemiological parameters shape the structure of phylogenetic trees. The starting point for the study is that shorter terminal branch lengths and higher rates of clustering are typically interpreted as a signature of increased rates of transmission. This assumption has yet to be supported by vigorous formal analyses. In the current study, Fabrizio Menardo relies on simulations to show that higher transmission rates do indeed result in shorter terminal branch lengths and higher rates of clustering. However, and importantly, he also shows that other parameters, such as shorter latency periods and a slower molecular clock can bring about the same changes. In fact, theoretically, a contracting epidemic (reproductive number R0 < 1) can still display signatures of relatively increased transmission.

    A major strength of the paper is that it uses simulations to thoroughly assess the impact of key epidemic parameters on phylogenetic tree shapes. These include latency, infectious period, transmission rate, basic reproductive number R0, sampling proportion, and molecular clock rate. This approach is clearly very useful, and probably the best approach for quantitative analysis of the relative contribution of key parameters to shaping a phylogenetic tree. The findings are interesting, and the paper does a good job relating them to published studies which relied on clustering rates and terminal branch length estimates to make epidemiological inferences. One example is a 20-year longitudinal study from Malawi, where Lineage 2 isolates were found to have higher clustering rates (and hence were interpreted to be more transmissible), yet did not increase in frequency over the study period. Menardo concludes that this apparent paradox can only be explained by a shorter infectious period in Lineage 2 infections. By re-interpreting findings from earlier studies.

    Analysing clustering rates and identifying clustering thresholds represents a more recognized challenge, whereas the challenge of interpreting TBLs has received less attention. Despite the challenges to interpreting TBLs and clustering rates identified by Menardo, I think these analyses can still have merit. In my opinion, differing latency times is a key factor to consider, whereas some of the other factors could perhaps be controlled for. E.g., I am not certain how realistic it is to have different sampling proportions for two co-circulating TB types/lineages in the same location over the same time span. In addition, I think it would be interesting with a short discussion of how working with dated / time-scaled trees differs from trees built from genetic distances. Assuming that a correct dated tree can be generated, this would at least nullify effects on clustering and TBLs stemming from differing mutation rates.

    All in all, I believe the current work is important and timely. A formal quantitative framework for assessing the effect of epidemiological parameters on tree shapes will be highly useful for researchers working at the interface between epidemiology, surveillance and microbial genomics, in TB and beyond.