Classification of unsequenced Mycobacterium tuberculosis strains in a high-burden setting using a pairwise logistic regression approach

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Over the past three decades, molecular epidemiological studies have provided new opportunities to investigate the transmission dynamics of M. tuberculosis. In most studies, a sizable fraction of individuals with notified tuberculosis cannot be included, either because they do not have culture-positive disease (and thus do not have specimens available for molecular typing) or because resources for conducting sequencing are limited. A recent study introduced a regression-based approach for inferring the membership of unsequenced tuberculosis cases into transmission clusters based on host demographic and epidemiological data. This method was able to identify the most likely cluster to which an unsequenced strain belonged with an accuracy of 35%, though in a low burden setting where a large fraction of cases occurred among foreign-born migrants. Here, we apply a similar model to M. tuberculosis WGS data from the Republic of Moldova, a setting of relatively high local transmission. Using a maximum cluster span of ~40 SNPs and a cluster size cutoff of n ≥ 10, we could best predict the specific cluster to which each clustered case was most likely to be a member with an accuracy of 17.2%. In sensitivity analyses, we found that a more restrictive (~20 SNPs threshold) or permissive (~80 SNPs) threshold did not improve performance. We found that increasing the minimum cluster size improved prediction accuracy. These findings highlight the challenges of transmission interference in high burden settings like Moldova.

Article activity feed

  1. Comments to Author

    Dear Authors There are several notes regarding your manuscript as follows. The title: I suggest you make a few changes that would be more appropriate to the study. Abstract - The abstract needs some changes and rearrangement of …. Over the past three decades, molecular epidemiological studies have provided new opportunities 21 to investigate the transmission dynamics of M. tuberculosis. In most studies, a sizable fraction of 22 individuals with notified tuberculosis cannot be included, either because they do not have 23 culture-positive disease (and thus do not have specimens available for molecular typing) or 24 because resources for conducting sequencing are limited. A recent study introduced a regression25 based approach for inferring the membership of unsequenced tuberculosis cases into 26 transmission clusters based on host demographic and epidemiological data. This method was 27 able to identify the most likely cluster to which an unsequenced strain belonged with an accuracy 28 of 35%, though in a low burden setting where a large fraction of cases occurred among foreign29 born migrants. Here, we apply a si This paragraph must be at the end of the research before the references .Data Summary: 7 38 Data Summary 39 40 All sequencing data in this study are available in BioProject 41 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA736718). Individual accession numbers for 42 sequences used in this analysis (n = 1582) are included in a supplementary file (Supplementary 43 Table 1). We use the R package 'lr2cluster' to produce the results reported in this study. All 44 scripts and data to reproduce the findin Introduction - Need to write in the passive ….. The application of whole genome sequencing (WGS) has allowed for high resolution typing and 63 drug-resistant profiling of M. tuberculosis (M.tb) sequences, transforming our understanding of 64 the transmission (1) and global dispersion of tuberculosis (TB) (2). An important use of WGS 65 data to infer transmission is by grouping similar sequences into clusters, which may be indicative 66 of recent transmission between hosts (3). The simplest way is to assign individuals with TB to 67 clusters based on the number of small nucleotide polymorphisms (SNPs) by which two strains 68 differ (4,5). Once clusters are defined, demographic and epidemiological data can be leveraged 69 to understand factors associated with recent transmission (1,6) and to support public health 70 interventions (7). 71 72 One challenge of most molecular epidemiological approaches is the need to isolate 73 mycobacterial DNA from culture prior to sequencing. Globally, only about 60% of pulmonary 74 M.tb cases are diagnosed based on a positive microbiological test (8). The high proportion of 75 unsequenced TB infections with microbiological confirmation limits the completeness of WGS 76 datasets, which may compromise inference about transmission in areas where many cases are not 77 culture positive or for whom sequencing is not performed. 78 79 A recent study introduced a pairwise logistic regression model for predicting membership in 80 M.tb transmission clusters using data collected in Valencia, Spain (9). Here, we adapted this 81 model and applied it to data collected in a large country-wide study of TB transmission in the 82 Republic of Moldova. - Need to write modern studies and write the objects of study in introduction. Methodology - It would be better to write the methods in many paragraphs Please, add more details for each method - Results : The results should be written in more details. discussion : add modern references , it is very very short Conclusion Conclusion, ….. re-write conclusion with numbered Bibliography/References : add more modern references and relevant to the study field and recent, it is recommended to use software like Mendeley or any modern programs to manage the bibliography list.

    Please rate the manuscript for methodological rigour

    Poor

    Please rate the quality of the presentation and structure of the manuscript

    Poor

    To what extent are the conclusions supported by the data?

    Not at all

    Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

    No

    Is there a potential financial or other conflict of interest between yourself and the author(s)?

    No

    If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

    Yes

  2. Comments to Author

    The paper 'Classification of unsequenced Mycobacterium tuberculosis strains in a high-burden setting using a pairwise logistic regression approach' by Rancu et al represents a worthwhile effort to assess whether host demographic and epidemiological data is useful for assigning unsequenced tuberculosis cases to transmission clusters in lieu of sequencing data. The authors find this to not work very well at all in a high-burden setting. In my opinion, two aspects of the paper needs clarification: 1) The authors were insipired by a paper by Susvitasari et al from the lower-incidence setting of Valencia, where the logistic regression approach apparently worked better. As the granularity of epidemiological information is expected to be important for model performance in the absence of WGS data, I miss a discussion/comparison of the data going into the model in the current paper compared to the Valencia paper. 2) The paper states that putative transmission networks were delineated using three patristic distance thresholds, representing the maximum phylogenetic distance between cases in the same cluster: 5e-4, 1e-3, and 2e-3 substitutions/site, corresponding to ~20, ~40, and ~80 SNPs, respectively. I found these calculations confusing, as the specified number of substitions/site translates to thousands of SNPs if one considers the genome size of Mtb of > 4 mbp. Presumably, the authors have a variable-sites-only alignment at hand which formed the basis for phylogenetic analyses. This needs to be cleaned up a bit, and at least a cursory description of the analyses steps taken to generate the alignment described, as the specified subst rates are not meaningful outside the setting of this specific paper, even though the SNP numbers make sense. I believe the paper showcases quite effectively how difficult transmission inference is in the absence of WGS data, and will represent a useful addition to the field if the above concerns can be handled in a satisfactory manner.

    Please rate the manuscript for methodological rigour

    Good

    Please rate the quality of the presentation and structure of the manuscript

    Very good

    To what extent are the conclusions supported by the data?

    Partially support

    Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

    No

    Is there a potential financial or other conflict of interest between yourself and the author(s)?

    No

    If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

    Yes