VAMPIRE: Analyzing variation and motif pattern in tandem repeats
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Tandem repeats (TRs) are pervasive in eukaryotic genomes and play key roles in genome organization, evolution, and function, particularly in complex regions such as centromeres and subtelomeres. Although long-read sequencing technologies have improved the resolution of these regions, existing methods remain limited in their ability to systematically and accurately characterize large-scale TRs. Here, we introduce VAMPIRE, a k-mer–based computational tool for comprehensive TR discovery, annotation, and quantification. Unlike previous methods, VAMPIRE enables reference-free, fine-grained decomposition of both simple and complex TRs, capturing motif variation in sequence, length, and structure with high sensitivity and scalability. Applied to complete telomere-to-telomere (T2T) human and nonhuman primate (NHP) genome assemblies, VAMPIRE reveals previously unrecognized high-order repeat inversions within human centromeres—an underappreciated evolutionary mechanism contributing to centromere diversity. Additionally, the tool identifies lineage-specific and expanded TRs, including human-specific STR/VNTR expansions and NHP-specific subtelomeric heterochromatin (e.g., pCht/StSat), underscoring their dynamic turnover and structural complexity. VAMPIRE provides a robust and scalable framework for TR analysis in the era of long-read sequencing, with broad utility across human genetics, evolutionary biology, and the study of complex TRs in non-model organisms.