The haplotype-resolved chromosome pairs and transcriptome of a heterozygous diploid African cassava cultivar

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Cassava ( Manihot esculenta ) is an important clonally propagated food crop in tropical and sub-tropical regions worldwide. Genetic gain by molecular breeding is limited because cassava has a highly heterozygous, repetitive and difficult to assemble genome.

Findings

Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present two chromosome scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. Genome comparisons revealed extensive chromosome re-arrangements and abundant intra-genomic and inter-genomic divergent sequences despite high gene synteny, with most large structural variations being LTR-retrotransposon related. Allele-specific expression analysis of different tissues based on the haplotype-resolved transcriptome identified both stable and inconsistent alleles with imbalanced expression patterns, while most alleles expressed coordinately. Among tissue-specific differentially expressed transcripts, coordinately and biasedly regulated transcripts were functionally enriched for different biological processes. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding.

Conclusions

The haplotype-resolved genome allows the first systematic view of the heterozygous diploid genome organization in cassava. The completely phased and annotated chromosome pairs will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy and continuity.

Article activity feed

  1. This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giac028), which carries out open, named peer-review.

    These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 2: C Robin Buell

    This manuscript describes the sequencing, assembly, annotation and analysis of a cassava genome. The cassava genome has already been published but this manuscript describes the genome of a heterozygous cultivar rather than the slightly inbred cultivar published previously. The authors performed the assembly using a number of assembler programs and benchmarked each assembly. Not surprisingly, they found that hifiasm worked the best with HiFi reads. The authors then did annotation of the genome and performed a set of analyses including allele specific expression and pan-genome analyses.

    The manuscript and its genome will be of use to a range of users in the genomics field. I do feel that the manuscript is exceedingly long and reads more of a dissertation than a research article. A significant portion of the text could be deleted and not impact the take home messages in the manuscript. For example, the analysis of allele specific expression, alternative splice form expression and the pangenome is extremely limited in depth and breadth. If these remain in the manuscript, the authors should perform more extended analyses including examining a wider range of tissues and genomes as there are extensive genomic resources available for cassava. It would be nice to tie this complete, phased assembly with the diversity analyses done previously with cassava that revealed the bases of genetic load.

    De novo annotation of the assembly was not performed. Instead, the authors projected the reference annotation onto their assembly and then did alignments with transcript data derived from IsoSeq. The authors are misinterpreting the pseudogenes. As shown earlier by Gan et al. (2011) with Arabidopsis, projection reference annotation on other genome assemblies fails to capture alternative splice forms and thus, predictions of pseudogenes from projected annotation are grossly in accurate. De novo annotation using cognate transcript evidence should be performed to ensure artifacts are not introduced into the annotation. This also would allow the authors to more deeply investigate the dysfunctional/deleterious alleles that are present in casava, a vegetatively propagated crop.

  2. This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giac028), which carries out open, named peer-review.

    These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 1: Zehong Ding

    In this manuscript, Qi et al. assembled two chromosome-scale haploid genomes in African cassava TME204, validated the structural and phasing accuracy of haplotigs by BACs and high-density genetic map, revealed extensive chromosome re-arrangements and abundant intra-genomic and inter-genomic divergent sequences, analyzed the allele-specific expression patterns in different tissues, and built a cassava pan-genome and demonstrated its importance in down-stream omics analysis. Overall, this work is of crucial importance and should be sufficient to publish in the GigaScience Journal. However, I found that this manuscript lacks the basic logical and some analyses have major flaws. Please see the details below:

    1. According to Supplementary table10, there were at least 9 different tissues of the TME204 Illumina RNA-seq data. However, when the authors performing analysis of 'Tissue specific differentially expressed transcripts (Line 393)', why just compared between leaf and stem but ignore the remaining tissues? This is illogical.

    2. Two cassava haplotypes (H1 and H2) were constructed in this study. In Table 4 and Supplementary figure 9, why the authors performed analysis between 'TME204 H1 vs. AM560' but did not mention the comparison between 'TME204 H2 vs. AM560' at all? Similarly, in Fig. 8 and Fig. 10c, the analysis was also performed in 'TME204 H1' but not in 'TME204 H2'.

    3. in Fig.7C, ASE should be the expression level comparisons between H1 and H2, why the legends still are H1 (red bar) and H2 (blue bar)? I cannot understand. Also in Fig. 7D, it's very difficult to understand this figure. E.g., what's the meaning of labels (e.g., "leaf_H1" and "Stem_H1; Leaf_H1") on x-axis? Logically, there are "stem_H1; leaf_H1", "stem_H1; leaf_H2", "stem_H2; leaf_H2", then where is the "stem_H2; leaf_H1"?

    4. Fig6d, Line 110-111, "The transcriptome comparison between TME204 leaf and stem tissues identified gene loci with associated transcripts that were differentially regulated in one haplotype only." This statement is not true because the comparison between leaf and stem cannot conclude that the transcripts were differentially regulated in one haplotype only. Thus, the sentences in Line 407-408 also need to be revised.

    Other suggestions to the authors:

    • Fig6a, what's meaning of Het_Uniq, Het_Dup, Hom_Uniq, and Hom_Dup.

    • Fig6d, what's the meaning of legend bar? Log2(leaf/stem) or log2(stem/leaf)? - ref30 cannot be cited because it is still under preparation.

    • In 'Conclusions section', the statement "The haplotype-resolved genome allows the first systematic view of the heterozygous diploid genome organization in cassava." is inaccurate, because two haplotypes in heterozygous cassava genome have already been published in Hu et al. (2021, Molecular Plant, 10.1016/j.molp.2021.04.009)

    • The title is also suggested to be changed because it is not attractive.

    • The citation of 'Figure 10b' (Line 497) and 'Figure 10c' (Line 502) are wrong.