Genome evolution and transcriptome plasticity associated with adaptation to monocot and eudicot plants in Colletotrichum fungi

This article has been Reviewed by the following groups

Read the full article

Abstract

Colletotrichum fungi infect a wide diversity of monocot and eudicot hosts, causing plant diseases on almost all economically important crops worldwide. In addition to its economic impact, Colletotrichum is a suitable model for the study of gene family evolution on a fine scale to uncover events in the genome that are associated with the evolution of biological characters important for host interactions. Here we present the genome sequences of 30 Colletotrichum species, 18 of them newly sequenced, covering the taxonomic diversity within the genus. A time-calibrated tree revealed that the Colletotrichum ancestor diverged in the late Cretaceous around 70 million years ago (mya) in parallel with the diversification of flowering plants. We provide evidence of independent host jumps from eudicots to monocots during the evolution of this pathogen, coinciding with a progressive shrinking of the degradative arsenal and expansions in lineage specific genes. Comparative transcriptomics of four reference species with different evolutionary histories and adapted to different hosts revealed similarity in gene content but differences in the modulation of their transcription profiles. Only a few orthologs show similar expression profiles on different plant cell walls. Combining genome sequences and expression profiles we identified a set of core genes, such as specific transcription factors, involved in plant cell wall degradation in Colletotrichum .Together, these results indicate that the ancestral Colletotrichum were associated with eudicot plants and certain branches progressively adapted to different monocot hosts, reshaping part of the degradative and transcriptional arsenal.

Article activity feed

  1. this pathogen, coinciding with a progressive shrinking of the degradative arsenal and expansions in lineage specific genes. Comparative transcriptomics of four reference species with different evolutionary histories and adapted to different hosts revealed similarity in gene content but differences in the modulation of their transcription profiles. Only a few orthologs show similar expression profiles on different plant cell walls. Combining genome sequences and expression profiles we identified a set of core genes, such as specific transcription factors, involved in plant cell wall degradation in Colletotrichum.Together, these results indicate that the ancestral Colletotrichum were associated with eudicot plants and certain branches progressively adapted to different monocot hosts, reshaping part of the degradative and transcriptional arsenal.

    Reviewer 2: Nicolas Lapalu This manuscript describes the adaptation of the Colletotrichum genus to monocotyledonous and dicotyledonous plants with regard to the content and expression of genes from 30 genomes, with a subsampling of 4 genomes for transcriptomic analyses. Major remarks: "Considering that the analyses carried out are affected by the sampling, as closely related species are likely to have more shared genes compared to species that are more distant from others," Yes, Indeed, it's clearly a possible bias due to the sampling, as you write. As you considered all genomes together to define specific genes, monocot specific species have few specific genes due to their phylogenetic proximity. Based on this, could you address these observations based on combination of figure 1 and 2:

    1. The number of specific genes in C.eremochloae (1608) vs in C.sublineola (1643), while divergence time between both seems short and similar to the group of C.lupini, C.costaricense (monoct) … with approximatively 100 genes specific to each species. How could such closely related genomes have acquired so many specific genes in such a short time compared with other species during the same period of evolution?
    2. Same remark for in C.phormii (911) vs C.salicis (286), when it's even more disturbing with the switch to dicot and a loss of many genes for C.salicis. For both cases mentionned above, a detailed comparison between the two genomes could be useful to obtain some explanations of the events and genes involved. Moreover, interpretation of the phylogenetic tree (Figure 1), could be lead to propose three clusters of genomes, based on evolution time and plant host: Monocot, Dicot "old" (C.orbiculare, C.noveboracense, …) and Dicot "young" (C.melonis, C.cuscutae, …). Did the authors attempt an analysis with a such view of the data? Maybe that will complete the view of C.acutatum complex (46 genes) vs C.graminciola complex (28 genes) form which C.orchidophylum and C.phormis are excluded. Finally, one of the most interesting thing is the proximity of C. phormii and C.salicis in the same clade but with a recent host specialization. Despite the poor quality of the genome of C.salicis vs C.phormii, an comparative genomic approach with a tool like Synchro could provide clues as to gene losses and their location (all along the genome/ specific regions ). Figure 3: Please explain further Figure 3 A, described as a PCA. No axis (dimension) has been shown with a % explaining the divergence between organisms. This is confusing and does not allow me to know whether the gene sets used to compare the 4 genomes are only shared genes or all genes. The rest of the figure is much clearer and the comments are clear on the response to species specificity (under/over expression of genes) for each genome. Figure 4: "the expression of the orthologous genes was clustered for the four fungal species (Figure 4A)" As written, it is assumed that you used ortholog genes established between the 4 species, this does not appear to be the case with so many genes missing in C.graminicola in figure 4. To continue on this point, I have not found the minimum number of species found in a cluster to set a cluster of orthologs (maybe written but not found). What is the threshold for divergence or sequence similarity? Have you considered sequence length (query coverage vs subject coverage) to allow clustering of potentially split/fragmented genes in annotations? Minor remarks: The authors limit their analysis to 30 genomes, whereas more than 270 genomes of Colletotrichum are available, from over 70 species. Research time is clearly longer than the time to generate genomic resources, but it could be interesting to list a few new genomes missing from those analysed and that could have significant added value (particularly if sequenced in long reads, providing complete genomes). Transcriptomic analyses were carried out on 4 genomes. The choice of the genomes was not discussed, and maybe done by convenience with strains available at the lab. In fact, C.higginsianum is well sequenced, assembled and studied and chosen as one of the specific hosts of dicotyledons, whereas it is a member of the C.destructivum complex. Similarly, C.phormii appears to be a recent species with an adaptation to monocots. L 113 : "species with bigger genomes are characterized by a lower GC content", please rewrite the link between genome size and GC content. Between species of same genus genome size is most often linked to the invasion of TE element (RIPed or not in fungi). Strongly ripped genomes (Leptosphaeria, Venturia) are not always large compared to the size of other species. Data availability: All genomes were released in public Databases. I do not find accession numbers for RNA-Seq runs. Many supplementary details have been provided. I appreciate the BUSCO logs for checking the completeness of gene sets, which provide me some clues about the quality of genome annotation, that was never discussed or pointed out in the manuscript as possible source of bias. Overall, the manuscript is very interesting and confirms the results previously identified in terms of specificities of CAZy families associated with host plant adaptation in the Colletotrichum genus. The authors demonstrate a great knowledge of the CAZome and associated biological processes, which provides a great deal of valuable information for the community working on Colletotrichum and more generally for all those working on such enzymes. Finally, the transcriptomic data suggest that species specificity and host adaptation are more related to an expression pattern than to specific gene content, than a specific gene content.
  2. Colletotrichum fungi infect a wide diversity of monocot and eudicot hosts, causing plant diseases on almost all economically important crops worldwide. In addition to its economic impact, Colletotrichum is a suitable model for the study of gene family evolution on a fine scale to uncover events in the genome that are associated with the evolution of biological characters important for host interactions. Here we present the genome sequences of 30 Colletotrichum species, 18 of them newly sequenced, covering the taxonomic diversity within the genus. A time-calibrated tree revealed that the Colletotrichum ancestor diverged in the late Cretaceous around 70 million years ago (mya) in parallel with the diversification of flowering plants. We

    Reviewer 1: Jamie McGowan In this study, Baroncelli and colleagues carry out a comprehensive analysis of genomic evolution in Colletotrichum fungi, an important group of plant pathogens with diverse and economically significant hosts. Their comparative genomic and phylogenomics analyses are based on the genome sequences of 30 Colletotrichum species spanning the diversity of the genus, including pathogens of dicots, monocots, and both dicots and monocots. This includes 18 genome sequences that are newly reported in this study. They also perform comparative transcriptomic analyses of 4 Colletotrichum species (2 dicot pathogens and 2 monocot pathogens) on different carbon sources. Overall, I thought the manuscript was very well written and technically sound. The results should be of interest to a broad audience, particularly to those interested in fungal evolutionary genomics and plant pathology. I only have a few minor comments. Minor comments: (1) Lines 50 - 51: "The plant cell wall (PCW) consists of many different polysaccharides that are attached not only to each other through a variety of linkages providing the main strength and structure for the PCW". I found this confusing - is the sentence incomplete? (2) Line 66: "Some Colletotrichum species show…" I think there should be a couple of introductory sentences about Colletotrichum before this. (3) Figure 1: It would be informative to label which genomes were sequenced with PacBio versus just Illumina. (4) Lines 254 - 255: "As no other enrichment was identified we performed a manual annotation of genes identified in Figure 3D". I don't think it is clear here what manual annotation this is referring to. (5) One area where I felt the analysis was lacking was the lack of analyses on genome repeat content. The authors highlight the large variation in genome sizes within Colletotrichum species (~44 Mb vs ~90 Mb) and show in Figure 1 that this correlates with increased non-coding DNA. It would have been interesting to determine if this is driven by the proliferation of particular repeat families. (6) Another concern is the inconsistent use of genome annotation methods. 12 of the genomes reported in this study were annotated using the JGI annotation pipeline, whereas the other 6 were annotated using the MAKER pipeline. Several studies (e.g., Weisman et al., 2022 - Current Biology) show that inconsistent genome annotation methods can inflate the number of observed lineage specific genes. The authors may wish to comment on this or demonstrate that this isn't an issue in their study (e.g., by aligning lineage specific proteins against the other genome assemblies).