Evolutionary Patterns of Microsatellite Distribution in Cricket Genomes: Insights from Comparative Genomics of Five Gryllidae Species
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Microsatellites or simple sequence repeats (SSRs) are valuable markers for understanding genome structure, function, and evolution. However, their distribution and characteristics remain largely unexplored in cricket species. We conducted a genome-wide identification and analysis of perfect (P-SSRs), compound (C-SSRs), and imperfect SSRs (I-SSRs) across five cricket genomes. The total number of SSRs ranged from 2,350,765 to 3,299,527, representing 5.37–7.27% of the genomes. Abundance followed the pattern I-SSRs > P-SSRs > C-SSRs across genomic regions (genome, intergenic, intronic, and coding sequences). Total SSR number, length, abundance and density showed no significant correlation with genome size. Trinucleotide repeats were consistently the most common P-SSR type. The (AAT)n motif predominated in genomes, intergenic, and introns, while (CCG)n was most frequent in coding sequences. Consequently, AT-rich repeats dominated non-coding regions, whereas GC-rich repeats were enriched in coding sequences. Coefficient of variation analysis of repeat copy numbers revealed distinct trends in P-SSR distribution across genomic regions and species. Functional annotation of coding sequences containing P-SSRs indicated involvement in binding, signal transduction, and transcription. This study represents, to our knowledge, the first comprehensive subtribe-level comparative analysis of SSRs in crickets, providing new insights into their genomic architecture.