Telomere-to-telomere cattle genome assembly reveals novel introgressions and selection signatures associated with adaptation and energy metabolism
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Assembly of complete genomes contribute to research on biology and evolution. Here, we present a complete telomere-to-telomere (T2T) genome assembly of male Holstein cattle, designated as T2T-cattle1.0-SAAS, encompassing 3.05 Gb of genomic data. Our assembly has a base accuracy of over 99.999%, and resolves about 342.36 Mb of previously unassembled regions (PURs) and introduces 641 new genes. We revealed that the Sat1.723 repeat units of centromeric satellites exhibit the highest enrichment of centromere protein-A (CENP-A) in bovine chromosomes. Eleven distinct autosomal centromere structures based on satellite arrays, and systematically annotated higher-order repeats (HOR) in bovine centromeric regions were identified for the first time, revealing intricate organizational patterns. Within the HOR arrays, we identified 518 monomers, with the majority clustering into six types of known satellite repeats and seven categories of novel repeat sequences. Furthermore, we report for the first time that HOR arrays in the centromeres of 12 autosomes contain palindromic structures. We demonstrated that the insertion time of long terminal repeat (LTR) elements (averaging approximately 7.99 MYAs) is significantly later than that in non-centromeric regions (averaging 27.83 MYAs). Additionally, their interspecific insertion burst time is later than that in pigs, sheep, and goats, with a higher insertion frequency. We also identified unusual early insertion events of centromeric LTRs in specific chromosomes. These findings confirm their role in driving evolution of the bovine genome. Cross-species comparison of DNA methylation patterns in the pseudoautosomal regions (PARs) compared to non-PARs revealed the interspecific differences on ChrX. The assembly improves the identification of structural variants (SVs) including 8,253 deletions and 4,281 insertions,as well as 843,727 SNPs in PURs. Our T2T assembly enhances genetic research, particularly the identifications of SV introgression from banteng (Bos javanicus) to Chinese zebu implicated in heat adaptation, and positive selection genes associated with energy metabolism, such as CAMK2B, in Holstein cattle.