Copy number variation analysis of 9,482 Mycobacterium tuberculosis isolates identifies lineage-specific molecular determinants
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Clinical manifestations of tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) show lineage-specific differences contributed by genetic polymorphism such as phylo-single nucleotide variations (PhyloSNPs) and insertion or deletions (INDELs). Intragenomic rearrangement events, such as gene duplications and deletions, may cause gene copy number differences in Mtb, contributing to lineage-specific phenotypic variations, if any, which need better understanding.
Results
The relative gene copy number differences in high-quality publicly available whole genome sequencing datasets of 9,482 clinical Mtb isolates were determined by repurposing and modifying an RNA-seq data analysis pipeline. The pipeline included various steps, viz., alignment of reads, sorting by coordinate, GC bias correction, and variant stabilising transformation. The strategy showed maximum separation of lineage-specific clusters in two principal components, capturing ∼54% variability. Unsupervised hierarchical clustering of the top 100 genes and pairwise comparisons between Mtb lineages revealed an overlapping subset of genes (n=42) having significantly perturbed copy numbers (Benjamin Hochberg adjusted P-value < 0.05 and log 2 (drug-resistant/sensitive) > ± 1). These 42 genes formed multiple tandem gene clusters and are known to be involved in virulence, pathogenicity and defence response to invading phages. A separate comparison showed a significantly high copy number of phage genes and a recently reported druggable target Rv1525 in pre- and extensively drug-resistant (Pre-XDR, XDR) compared to drug-sensitive clinical Mtb isolates.
Conclusion
The identified gene sets in Mtb clinical isolates may be useful targets for lineage-specific therapeutics and diagnostics development.