In-silico evidence of non-operonic fusion transcripts in Mycobacterium tuberculosis : algorithm optimization and signatures of genome plasticity
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
The genome of Mycobacterium tuberculosis (Mtb) is known for its stable nature. It also contains transposases, redundant genes, repetitive DNA sequences, integrases, and remnants of lysogenized mycobacteriophages. These factors can result in intragenomic recombination, resulting in the formation of fusion transcripts. The present study aimed to identify signatures of long-distance gene fusion transcripts in RNA-seq data of clinical Mtb isolates.
Methodology
Three approaches based on separate principles (split read alignment, repurposing STAR chimera, and transcript de novo assembly). The intersections of fusion calls between the three approaches that showed maximum performance were used for detecting fusions with real RNA-seq datasets of Mtb.
Resuls
The junction of the split reads approach and the repurposed STAR chimera showed a high performance (F1 > 0.9). Sequence characteristics, clustering, and gene burden of operonic and long-distance gene fusions were consistent between the two independent real datasets, showing robustness of the optimized strategy. Fusion transcripts showed lineage specificity and signatures of indirect involvement of transposases, and transposition accessory genes (Rv1199c, Rv2512c, Rv3115, Rv0395, Rv2808, and Rv3327) in intragenomic recombination, resulting in the formation of fusion transcripts. The fusions mainly were within transposases, PPE, PE_PGRS family proteins, and some isolated fusions were of genes involved in the MoCo pathway, vesicle transport, and lipid turnover.
Conclusions
The observed fusions are likely driven by natural recombination, resulting in the formation of fusion proteins, coregulating proteins, or disruption. The study shows that the Mtb genome, especially of clinical isolates, may not be as stable as believed.
Importance
The Mtb genome is believed to be stable, clonal, and immune to HGT, and thus, only SNPs and INDELs are thought to drive evolution. However, the drastic differences in phenotypes such as growth kinetics, virulence, and metabolic rate observed in clinical isolates compared to laboratory strains cannot be entirely attributable to SNPs and INDELs. The Mtb genome contains transposases and other accessory genes that can drive intragenomic recombination, bringing distant genes closer. As a result, there is a possibility of the occurrence of fusion transcripts. Growing evidence and our previous contributions also suggest changes in gene repertoires and gene copy numbers, which are also likely driven by intragenomic recombination events. This study presents optimization of a robust and easy-to-implement fusion calling algorithm using traditional bioinformatic calls. Using the same, we report fusion transcripts of non-operonic genes in the RNA-seq data of clinical Mtb isolates.