A complete and near-perfect rhesus macaque reference genome: lessons from subtelomeric repeats and sequencing bias

Shilong Zhang
Ning Xu
Yong Lu
Yanhong Nie
Zhengtong Li
Luciana de Gennaro
Lianting Fu
Zhendong Zhang
Jieyi Chen
Kaiyue Ma
Xiangyu Yang
Juan Zhang
Matthew T. Schmitz
Francesca Antonacci
Trygve E. Bakken
Mario Ventura
Adam M. Phillippy
Qiang Sun
Yafei Mao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

A truly complete, telomere-to-telomere (T2T), and error-free reference genome remains a foundational resource—and long-standing goal—for unbiased comparative and functional genomics. While recent T2T assemblies of humans and other primates have made substantial progress, most still contain thousands of base-level errors, particularly within highly repetitive regions. Here, we present T2T-MMU8v2.0, a near-perfect T2T assembly of the rhesus macaque ( Macaca mulatta ), representing the highest base-level accuracy reported in a primate genome to date. By employing an optimized ONT-only assembly strategy, we identify subtelomeric satellite-rich regions as the principal bottleneck to improving assembly quality, owing to technological biases in long-read platforms and limitations in current hybrid assembly frameworks. We discover 268 previously unannotated repeat families and resolve ∼8 Mbp of SATR satellite arrays, with over 99-fold enrichment in historically misassembled subtelomeric regions. These satellites form four distinct genomic architectures, each with unique SATR satellite composition, segmental duplication organization, and epigenetic signatures, distinct from the subtelomeric architectures observed in hominid genomes. Notably, in contrast to the largely gene-poor subtelomeric regions in African hominids, the SATR architectures in macaques harbor 58 actively transcribed genes, supported by open chromatin and expression data, suggesting gene innovation within these repetitive regions. Functionally, T2T-MMU8v2.0 improves read mappability and accuracy across sequencing platforms, and results in a 19% improvement of transcription start site enrichment scores and 5,821 additional chromatin accessibility peaks on average, thereby enhancing variant detection, regulatory annotation, and transcriptomic resolution in population genetics or single-nucleus studies. Together, this work establishes a new benchmark for genomics, offers a roadmap for resolving complex repetitive regions, and reveals previously unrecognized features of subtelomeric genome structure and evolution.

Version published to 10.1101/2025.08.04.668424 on bioRxiv
Aug 4, 2025

Nanopore Data-Driven Near-T2T Genome Assembly of <em>Hippophae rhamnoides</em> ssp. <em>mongolica</em> Rousi

This article has 15 authors:
1. Alexander Arkhipov
2. Nadezhda Bolsheva
3. Elena Pushkova
4. Vladislav Babenko
5. Yury Zubarev
6. Vera Kovalenko
7. Fedor Kostromskoy
8. Elizaveta Ivankina
9. Ekaterina Dvorianinova
10. Nikolai Barsukov
11. Daiana Krupskaya
12. Elena Borkhert
13. Ksenia Klimina
14. Nataliya Melnikova
15. Alexey Dmitriev
This article has no evaluationsLatest version Dec 15, 2025
A Benchmarking Framework to Catalyze Individual Human Genome Projects

This article has 3 authors:
1. Manjushri kalpande
2. Apoorva Ganesh
3. Subhashini Srinivasan
This article has no evaluationsLatest version Dec 17, 2025
Human Chromosome 2 Fusion in Hominin Evolution: Cytogenetic Evidence, Drift-Aware Establishment Under Underdominance, and T2T-Era Paleogenomic Audits

This article has 1 author:
1. Mohamed Sacha
This article has no evaluationsLatest version Jan 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Nanopore Data-Driven Near-T2T Genome Assembly of <em>Hippophae rhamnoides</em> ssp. <em>mongolica</em> Rousi

A Benchmarking Framework to Catalyze Individual Human Genome Projects

Human Chromosome 2 Fusion in Hominin Evolution: Cytogenetic Evidence, Drift-Aware Establishment Under Underdominance, and T2T-Era Paleogenomic Audits