Complete genomes reveal the full extent of Mycobacterium tuberculosis complex diversity across evolutionary scales
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Advances in short-read sequencing have enhanced our understanding of Mycobacterium tuberculosis complex (MTBC), but fail to capture its complete genomic diversity. We applied long-read sequencing to 216 isolates from the Valencia Region (Spain) and generated high-quality, complete genomes, revealing detailed insights into MTBC evolution across timescales. Complete genome comparisons increased the estimated evolutionary rate by 1.5-fold, resulting in a median of 312 (–1 to 792) additional SNPs per pairwise comparison. Multiple diversity hotspots were identified, mostly in the pe/ppe genes and driven by gene conversion. However, most PE/PPE epitopes were hyperconserved, with notable exceptions involving vaccine candidates. Incorporating previously undetected SNPs and indels improved resolution in transmission analyses. Furthermore, patient-specific reference mapping validates only 5–10% of within-host diversity detected by standard pipelines, indicating substantial overestimation in previous studies. These findings expand our view of MTBC diversity and have important implications for understanding host-pathogen interactions, epidemiology, and transmission dynamics.