Large contribution of repeats to genetic variation in a transmission cluster of Mycobacterium tuberculosis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Repeats are the most diverse and dynamic, but also the least well understood component of microbial genomes. For all we know, repeat-associated mutations such as duplications, deletions, inversions, and gene conversion might be as common as point mutations, but because of short-read myopia and methodological bias they have received much less attention. Long-read sequencing opens the perspective of resolving repeats and systematically investigating the mutations they induce. For this study, we assembled the genomes of 16 closely related strains of the bacterial pathogen Mycobacterium tuberculosis from PacBio HiFi reads, with the aim of characterizing the full spectrum of DNA polymorphisms. We find that complete and accurate genomes can be assembled from HiFi reads, with read size being the main limitation in the presence of duplications. By combining a reference-free pangenome graph with extensive repeat annotation, we identified 110 variants, 58 of which can be assigned to repeat-associated mutational mechanisms such as strand slippage and homologous recombination. While recombination events are less frequent than point mutations, they can affect large regions and introduce multiple variants at once, as shown by three gene conversion events and a duplication of 7.3 kb that involve ppe18 and ppe57, two genes possibly involved in immune subversion. Our study shows that the contribution of repeat-associated mechanisms of mutation can be similar to that of point mutations at the microevolutionary scale of an outbreak. A large reservoir of unstudied genetic variation in this "monomorphic" bacterial pathogen awaits investigation.

Article activity feed