Complete genomes reveal the full extent of Mycobacterium tuberculosis complex diversity across evolutionary scales
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Advances in short-read sequencing have enhanced our understanding of Mycobacterium tuberculosis complex (MTBC), but fail to capture its complete genomic diversity. We applied long-read sequencing to 216 isolates from the Valencia Region (Spain) and generated high-quality, complete genomes, revealing detailed insights into MTBC evolution across timescales. Complete genome comparisons increased the estimated evolutionary rate by 1.5-fold, resulting in a median of 312 (–1 to 792) additional SNPs per pairwise comparison. Multiple diversity hotspots were identified, mostly in the pe/ppe genes and driven by gene conversion. However, most PE/PPE epitopes were hyperconserved, with notable exceptions involving vaccine candidates. Incorporating previously undetected SNPs and indels improved resolution in transmission analyses. Furthermore, patient-specific reference mapping validates only 5–10% of within-host diversity detected by standard pipelines, indicating substantial overestimation in previous studies. These findings expand our view of MTBC diversity and have important implications for understanding host-pathogen interactions, epidemiology, and transmission dynamics.