A complete and near-perfect rhesus macaque reference genome: lessons from subtelomeric repeats and sequencing bias

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A truly complete, telomere-to-telomere (T2T), and error-free reference genome remains a foundational resource—and long-standing goal—for unbiased comparative and functional genomics. While recent T2T assemblies of humans and other primates have made substantial progress, most still contain thousands of base-level errors, particularly within highly repetitive regions. Here, we present T2T-MMU8v2.0, a near-perfect T2T assembly of the rhesus macaque ( Macaca mulatta ), representing the highest base-level accuracy reported in a primate genome to date. By employing an optimized ONT-only assembly strategy, we identify subtelomeric satellite-rich regions as the principal bottleneck to improving assembly quality, owing to technological biases in long-read platforms and limitations in current hybrid assembly frameworks. We discover 268 previously unannotated repeat families and resolve ∼8 Mbp of SATR satellite arrays, with over 99-fold enrichment in historically misassembled subtelomeric regions. These satellites form four distinct genomic architectures, each with unique SATR satellite composition, segmental duplication organization, and epigenetic signatures, distinct from the subtelomeric architectures observed in hominid genomes. Notably, in contrast to the largely gene-poor subtelomeric regions in African hominids, the SATR architectures in macaques harbor 58 actively transcribed genes, supported by open chromatin and expression data, suggesting gene innovation within these repetitive regions. Functionally, T2T-MMU8v2.0 improves read mappability and accuracy across sequencing platforms, and results in a 19% improvement of transcription start site enrichment scores and 5,821 additional chromatin accessibility peaks on average, thereby enhancing variant detection, regulatory annotation, and transcriptomic resolution in population genetics or single-nucleus studies. Together, this work establishes a new benchmark for genomics, offers a roadmap for resolving complex repetitive regions, and reveals previously unrecognized features of subtelomeric genome structure and evolution.

Article activity feed