Understanding the epidemiology and pathogenesis of Mycobacterium tuberculosis with non-redundant pangenome and population genetics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Tuberculosis is a major public health threat demanding more than one million lives every year. Many challenges exist to defeat this deadly infectious disease which address the importance of a thorough understanding of the biology of the causative agent Mycobacterium tuberculosis (MTB). We generated a non-redundant pangenome of 420 epidemic MTB strains from China. We estimate that MTB strains have a pangenome of 4,278 genes encoding 4,183 proteins, of which 3,438 of which are core genes. However, due to 99,694 interruptions in 2,447 coding genes, only 1,651 may be translated in all samples, which dramatically reduces the number of active core genes. Of these interruptions, 67,315 (67.52%) could be classified by various genetic variations detected by currently available tools, and more than half of them are due to structure variations, mostly small indels. We further describe differential evolutionary patterns of genes under the influences of selective pressure, population structure and background selection. While selective pressure is ubiquitous among these coding genes, evolutionary adaptations primarily occur in 1,313 genes. Genes located in the cell wall and membrane region are under the strongest selective pressure, while biological processes including regulation of transcription, translation and regulation of growth are under strongest background selection in MTB. The metabolism of fatty acids may be an outstanding example of evolutionary adaption for MTB under current selective pressure. This study provides a comprehensive view on the genetic diversity and evolution patterns of coding genes in MTB which may deepen our understanding of its epidemiology and pathogenicity.