A population-scale map of human tandem repeat composition and mutation dynamics from long-read assemblies
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Tandem repeats (TRs) are highly mutable genomic elements composed of repeated motifs that drive genetic diversity and contribute to disease. Due to limited resolution at population scale, their mutation dynamics are incompletely understood. Here, we present TRCompDB, a reference database constructed on 832 human genomes that annotates the compositional diversity and population structure for 4.4 million TR loci. Leveraging SNP-based genealogical trees, we estimated locus-specific mutation rates for 4,037,274 TRs, revealing mutation rates ranging from 10 −10 to 10 −3 on TRCompDB loci. In addition to TR length and coding potential, mutation rate was influenced by recombination and the wider duplication context. Moreover, a subset of AT-rich pentanucleotide repeats including loci linked to Familial Myoclonic Adult Epilepsy and spinocerebellar ataxia showed extreme mutation rates. Beyond single-motif changes, 29,964 loci exhibited higher-order duplication patterns that increased allelic complexity, among which 3,351 VNTRs showed evolutionary recurrence. These results establish TRCompDB as a resource for studying global patterns of tandem repeat evolution and their contribution to human disease.