A haplotype-resolved reference genome for Eucalyptus grandis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
E. grandis is a hardwood tree used worldwide as pure species or hybrid partner to breed fast-growing plantation forestry crops that serve as feedstocks of timber and lignocellulosic biomass for pulp, paper, biomaterials and biorefinery products. The current v2.0 genome reference for the species (Bartholome et al., 2015; Myburg et al., 2014) served as the first reference for the genus and has helped drive the development of molecular breeding tools for eucalypts. Using PacBio HiFi long reads and Omni-C proximity ligation sequencing, we produced an improved, haplotype phased assembly (v4.0) for TAG0014, an early-generation selection of E. grandis. The two haplotypes are 571 Mbp (HAP1) and 552 Mbp (HAP2) in size and consist of 37 and 46 contigs scaffolded onto 11 chromosomes (contig N50 of 28.9 and 16.7 Mbp), respectively. These haplotype assemblies are 70 to 90 Mbp smaller than the diploid v2.0 assembly but capture all except one of the 22 telomeres, suggesting that substantial redundant sequence was included in the previous assembly. A total of 35,929 (HAP1) and 35,583 (HAP2) gene models were annotated, of which 438 and 472 contain long introns (>10 kbp) in gene models previously (v2.0) identified as multiple smaller genes. These and other improvements have increased gene annotation completeness levels from 93.8% to 99.4% in the v4.0 assembly. We found that 6,493 and 6,346 genes are within tandem duplicate arrays (HAP1 and HAP2, respectively, 18.4% and 17.8% of the total) and >43.8% of the haplotype assemblies consists of repeat elements. Analysis of synteny between the haplotypes and the E. grandis v2.0 reference genome revealed extensive regions of collinearity, but also some major rearrangements, and provided a preview of population and pan-genome variation in the species.
Paper summary
We assembled a haplotype-phased genome for Eucalyptus grandis that will serve as reference for the most widely planted hardwood crop globally. It includes more than 430 new gene models with long introns and has 6% higher annotation completeness. The phased assembly provides a more accurate look at genome variation at DNA and transcript level and will better support future studies of genome structure and function. The improved assembly contains more tandem duplicate genes compared to the previous unphased reference. Finally, major genomic rearrangements between the two phased genomes provide a preview of pangenome and structural variation in E. grandis .