The Genetic and Evolutionary Landscape of Pentanucleotide Tandem Repeats in Human Genomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Tandem repeats are a class of genetic variation characterized by repetitions of DNA motifs of diverse sequences and lengths. Pentameric motifs, in particular, do not align with the protein-coding reading frames, often confining them to the less extensively studied non-coding portion of the genome. Intronic pentanucleotide tandem repeats are associated with 11 neurological disorders. Their pathogenic genotypes are primarily characterized by large length expansion and, unlike most repeat expansion disorders, have required a deviation from the reference motif. The population-level variability of pentameric repeats remains poorly understood due to technical limitations of short-read sequencing technologies. To address this knowledge gap, we genotyped 28,446 pentanucleotide repeat loci in 1,027 long-read PacBio HiFi samples from self-identified Black and African American participants in the All of Us Research Program. We developed new algorithms for tandem repeat decomposition, characterization, and visualization to facilitate this analysis. Our findings reveal extensive genome-wide heterogeneity in repeat length and sequence composition. Alleles with DNA sequences containing segments of distinct motifs were observed in 15% of loci. Additionally, 8% of loci exhibited multimodal repeat length distributions, in which distinct sequence compositions were often associated with distinct length ranges. Repeat loci were highly enriched in proximity to transposable elements, including 68% mapping to Alu elements, a retrotransposon specific to primates. Comparative analysis of the reference sequences from 30 species, including 27 primates, suggests that most pentanucleotide repeats fully emerged in a common ancestor of humans and other ape species, or even within more closely related hominin lineages. As an example, we describe a highly polymorphic and recently evolved repeat locus in the ABCA1 gene, a major regulator of cellular cholesterol and phospholipid homeostasis. This study provides novel and comprehensive insights into the evolutionary formation of pentamer tandem repeat loci and their extensive variability across human genomes.