Genomic decoding of Theobroma grandiflorum (cupuassu) at chromosomal scale: Evolutionary insights for horticultural innovation

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Theobroma grandiflorum (Malvaceae), known as cupuassu, is a tree indigenous to the Amazon Basin, valued for its large fruits and seed-pulp, contributing notably to the Amazonian bioeconomy. The seed-pulp is utilized in desserts and beverages, and its seed butter is used in cosmetics. Here, we present the sequenced telomere-to-telomere cupuassu genome, disclosing features of the genomic structure, evolution, and phylogenetic relationships within the Malvaceae.

Results

The cupuassu genome spans 423 Mb, encodes 31,381 genes distributed in the ten chromosomes, and it exhibits approximately 65% gene synteny with the T. cacao genome, reflecting a conserved evolutionary history, albeit punctuated with unique genomic variations. The main changes are pronounced by bursts of long-terminal repeats retrotransposons expansion at post-species divergence, retrocopied and singleton genes, and gene families displaying distinctive patterns of expansion and contraction. Furthermore, positively selected genes are evident, particularly among retained and dispersed, tandem and proximal duplicated genes associated to general fruit and seed traits and defense mechanisms, supporting the hypothesis of potential episodes of subfunctionalization and neofunctionalization following duplication, and impact from distinct domestication process. These genomic variations may underpin the differences observed in fruit and seed morphology, ripening, and disease resistance between cupuassu and the other Malvaceae species.

Conclusions

Sequencing the cupuassu genome offers a foundational resource for both breeding and conservation efforts, yielding insights into the evolution and diversity within the genus Theobroma .

Core ideas

  • Telomere-to-telomere sequencing of the Theobroma grandiflorum genome elucidates a 65% synteny with T. cacao .

  • Retrotransposon expansion identified as a pivotal factor in post-divergence genomic evolution between Theobroma species.

  • Comparative genomics has revealed genes associated with key agronomic traits, providing evolutionary insights.

  • Positive selection pressure in retained duplicated genes implicated in adaptive functions and fruit-seed traits diversity.

  • Cupuassu genome as a genetic resource for breeding and to boosts Brazillian Amazonian bioeconomy.

AUTHOR SUMMARY

Cupuassu, a fruit-bearing tree from the Amazon, is prized for its nutritious fruits and seeds, widely used in food and cosmetics. In this study, we sequenced the complete genome of cupuassu to understand its development, unique traits, and genetic relationship with cacao and other related species. The cupuassu genome shares a high similarity with cacao, but it also exhibits distinctive features. Notably, repetitive DNA elements have significantly influenced its genomic structure. Furthermore, specific genes responsible for its fruit and seed characteristics, as well as disease resistance, were identified. Overall, this research not only deepens our knowledge of cupuassu genetics but also illuminates broader aspects of plant evolution and diversity in the Amazon. It lays the groundwork for advanced breeding programs and promises to contribute significantly to the Amazonian bioeconomy. Ultimately, these findings have important implications for agriculture and conservation, highlighting the intricate processes of plant adaptation and evolution.

Article activity feed

  1. Results The cupuassu genome spans 423 Mb, encodes 31,381 genes distributed in the ten chromosomes, and it exhibits approximately 65% gene synteny with the T. cacao genome, reflecting a conserved evolutionary history, albeit punctuated with unique genomic variations. The main changes are pronounced by bursts of long-terminal repeats retrotransposons expansion at post-species divergence, retrocopied and singleton genes, and gene families displaying distinctive patterns of expansion and contraction. Furthermore, positively selected genes are evident, particularly among retained and dispersed, tandem and proximal duplicated genes associated to general fruit and seed traits and defense mechanisms, supporting the hypothesis of potential episodes of subfunctionalization and neofunctionalization following duplication, and impact from distinct domestication process. These genomic variations may underpin the differences observed in fruit and seed morphology, ripening, and disease resistance between cupuassu and the other Malvaceae species.

    Reviewer 2: Jian-Feng Mao Rafael et al. contributed their study, "Genomic decoding of Theobroma grandiflorum (cupuassu) at chromosomal scale:Evolutionary insights for horticultural innovation". In this study, high-quality genome assembly for an important plant was generated and the authors further investigated genome characterization, genome evolution, gene families etc. The data quality is high, though some points need to be clarified. And the reported data and investigations could provide valuable inference for following studies.This paper is generally well-prepared.Major comments:1. Quality control of genome assembly. The quality of genome assembly could be better evaluated with more stringent parameters. On assembly quality control, I will recommend to always follow criteria established in Earth Biogenome Project (Report on Assembly Standards, https://www.earthbiogenome.org/assembly-standards). Please evaluate the present assemblies with the criteria from EBP project, I think, on at least some if not all the items. At least, I think Merqury results would be very informative.Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02134-92. Gaps in each pseudo-chromosome. Not clear if gaps are still remained, or the genome is of gap-free?3. Centromere region. How centromeres were identified? Centromeres were shown, but no description on how you did identify them. Given the high quality of genome assembly, it would be very interesting to incorporate the investigation into distribution of centromeres. A pipeline (https://github.com/ShuaiNIEgithub/Centromics (identifying centromere with multi-oimcs data, such as repeat profiling, and Hi-C chromatin contact) is helpful, and it was generally described at https://academic.oup.com/hr/article/10/1/uhac241/6775201?login=true) has already been prepared and widely applied in data analyses in some just published T2T assemblies.

  2. Background Theobroma grandiflorum (Malvaceae), known as cupuassu, is a tree indigenous to the Amazon Basin, valued for its large fruits and seed-pulp, contributing notably to the Amazonian bioeconomy. The seed-pulp is utilized in desserts and beverages, and its seed butter is used in cosmetics. Here, we present the sequenced telomere-to-telomere cupuassu genome, disclosing features of the genomic structure, evolution, and phylogenetic relationships within the Malvaceae.

    Reviewer 1: Xupo Ding

    1. The Line or page number should be added in the revised manuscript, it is hard to point the comment to definite line.2. The methods and parameters of TE analysis should be detailed in the main text or supplementary file, especially for the LAI calculation, the LAI output by our pipeline is 11.47 and the pipeline was built according to default parameters of LTR_retiever (https://github.com/oushujun/LTR_retriever).3. What was the mutation rate (r) used for TE insert time calculation? If the insertion time were from the original files of EDTA, please notice that the default r is 1.3e-8 of grass family once --u was not set with promoting EDTA, that should be converted with the correct r value.4. Generally, the Gypsy content was usually more than Copia content in plant genome, please check it. If it were correct, please infer the reason.5. All results of GO enrichment were better enriched with KEGG.6. The results about enrichment were wrote hastily, lots of GO function or GO numbers were just list, the details should be abundant. Cite the Figures or tables or references in these sections.7. In Figure 1C, the Ks distribution need corrective, the authors can refer the polyploidization of durian genome published on Plant physiology in 2019.8. In Figure 2C, why some orders of TE loss the SD?9. In Figure 3A, T. grandiflorum and T. cacao present highly syntenic at gene level, the software of Liftoff might detect extra genes to T. grandiflorum genome based on the T. cacao genome. This is just a suggestion.10. In Figure 5A, there were 282 special genes in T. grandiflorum, please enrichment them with GO and KEGG.11. Figure 5B and D were from the GO enrichment the GO numbers should be added around annotation or list them in the supplementary files.12. In Figure 5C, the confidence interval of divergence time should be added.13. In the data availability, the weblink is not for everyone, GigaDB will record your data, so the unopened weblink might not necessary.14. In the MS, disease resistance were mentioned repeatedly, the GO enrichment has been provided some evidence, it will be better to perform the KEGG analysis with the special genes and expanded or contracted genes to verify, especially stat the changes in the ko04626.15. The language must be improved and modified by naive academic English speaker.