A high-quality pseudo-phased genome for Melaleuca quinquenervia shows allelic diversity of NLR-type resistance genes

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Background

The coastal wetland tree species Melaleuca quinquenervia (Cav.) S.T.Blake (Myrtaceae), commonly named the broad-leaved paperbark, is a foundation species in eastern Australia, Indonesia, Papua New Guinea, and New Caledonia. The species has been widely grown as an ornamental, becoming invasive in areas such as Florida in the United States. Long-lived trees must respond to a wide range pests and pathogens throughout their lifespan, and immune receptors encoded by the nucleotide- binding domain and leucine-rich repeat containing (NLR) gene family play a key role in plant stress responses. Expansion of this gene family is driven largely by tandem duplication, resulting in a clustering arrangement on chromosomes. Due to this clustering and their highly repetitive domain structure, comprehensive annotation of NLR encoding genes within genomes has been difficult. Additionally, as many genomes are still presented in their haploid, collapsed state, the full allelic diversity of the NLR gene family has not been widely published for outcrossing tree species.

Results

We assembled a chromosome-level pseudo-phased genome for M . quinquenervia and describe the full allelic diversity of plant NLRs using the novel FindPlantNLRs pipeline. Analysis reveals variation in the number of NLR genes on each haplotype, differences in clusters and in the types and numbers of novel integrated domains.

Conclusions

We anticipate that the high quality of the genome for M. quinquenervia will provide a new framework for functional and evolutionary studies into this important tree species. Our results indicate a likely role for maintenance of NLR allelic diversity to enable response to environmental stress, and we suggest that this allelic diversity may be even more important for long-lived plants.

Article activity feed

  1. Competing Interest StatementThe authors have declared no competing interest.

    Reviewer 2--Julia Voelker

    The manuscript about NLR-type resistance genes in two haplotypes of Melaleuca quinquenervia is a relevant contribution to the research of Myrtaceae genomes and other long-lived trees. The methods are well described and should be reproducible with the available information and raw data, provided the authors mentioned all non-default settings in the method section. The FindPlantNLRs pipeline seems to be well documented on github.

    I believe that this manuscript is ready for publication after some small changes. Page and line numbers in the comments below refer to the PDF document:

    1. The quality of some figures is not good (even upon download and zoom into the plot) and should be improved to higher resolution for publication. Especially in figure 3, all labels are too pixelated and hard to read. I would also recommend an increase in text size for this figure. In Figure 6 D & E, the authors should consider using consistent text sizes on the axes, and even though the quality is acceptable, a higher resolution of the labels would still be better.

    2. p. 10, Table 2: Although it is a standard statistic for genome assemblies, it would be helpful for some readers to specify what N50 and L50 are.

    3. p. 19, line 436: I believe the authors are referring to the wrong figure number.

    Below are some additional comments regarding typos or other language issues. While the text is generally well written, I would appreciate commas in certain sentences to improve readability, and think that some nouns are missing articles. I hope the authors will read through their text again and add articles where required, I won't point them out individually.

    p.4, line 33: wide range of p.7, line 130: 'a' instead of 8? p. 8, line 177: genome p.12, line 250: chromosome 2, add comma before 'while' in next line p.12, line 253: on all other chromosomes? p. 13, line 271: to occur? p.16, line 347: remove 'and' p.17, line 382, 384: orthologs? p.20, line 469: 'lead to the triggering of defence response' rephrase to make sense with the previous half of the sentence, also, defence response should have an article p.20, line 489/490: missing word?

  2. Background The coastal wetland tree species Melaleuca quinquenervia (Cav.) S.T.Blake (Myrtaceae), commonly named the broad-leaved paperbark, is a foundation species in eastern Australia, Indonesia, Papua New Guinea, and New Caledonia. The species has been widely grown as an ornamental, becoming invasive in areas such as Florida in the United States. Long-lived trees must respond to a wide range pests and pathogens throughout their lifespan, and immune receptors encoded by the nucleotide- binding domain and leucine-rich repeat containing (NLR) gene family play a key role in plant stress responses. Expansion of this gene family is driven largely by tandem duplication, resulting in a clustering arrangement on chromosomes. Due to this clustering and their highly repetitive domain structure, comprehensive annotation of NLR encoding genes within genomes has been difficult. Additionally, as many genomes are still presented in their haploid, collapsed state, the full allelic diversity of the NLR gene family has not been widely published for outcrossing tree species.Results We assembled a chromosome-level pseudo-phased genome for M. quinquenervia and describe the full allelic diversity of plant NLRs using the novel FindPlantNLRs pipeline. Analysis reveals variation in the number of NLR genes on each haplotype, differences in clusters and in the types and numbers of novel integrated domains.Conclusions We anticipate that the high quality of the genome for M. quinquenervia will provide a new framework for functional and evolutionary studies into this important tree species. Our results indicate a likely role for maintenance of NLR allelic diversity to enable response to environmental stress, and we suggest that this allelic diversity may be even more important for long-lived plants.

    Reviewer 1– Andrew Read – University of Minnesota

    In the manuscript, A high-quality pseudo-phased genome for Melaleuca quinquenervia shows allelic diversity of NLR-type resistance genes, the authors assemble and analyze a phased genome of a long-lived tree species. In addition to providing a phased genomic resource for an important species, the authors analyze and compare the NLR gene complement in each of the two diploid genomes. I was surprised by the level of diversity of NLR genes in the two copies of the genome (this may be due to my biases based on working in highly homozygous species). This level of within-individual diversity has been largely overlooked by researchers owing to the difficulties of sequencing, assembly, and NLR identification. To address NLR identification, the authors publish a very nice pipeline that combines available tools into a framework that makes a lot of sense to me and will be valuable to anyone doing NLR gene work on new or existing genome assemblies. My main concern comes from not knowing how sequencing gaps and NLRs correlate across the two diploid genomes. Other than this, I think it’s a very nice paper that adds to the growing catalog of NLR gene diversity by tackling the challenge of NLRs in a heterozygous genome.

    Many of the authors’ interesting observations are based on comparisons of NLRs on the two haploid genomes, however some things are not clear to me:
    1.  Do any predicted NLR-genes overlap gaps in the alternative haploid genome? 
    2.  If there is a predicted NLR-gene in one haploid genome and not the alternative genome, what is at the locus? Is it a structural variant indicating insertion/deletion of the NLR or is there ‘NLR-like’ sequence there that just didn’t pass the pipeline filters indicating an NLR fossil (or similar) – to me this is an important distinction.
    3.  How many of the NLR-genes on the two haploid genomes cluster 1:1 with their homolog on the alternative haploid genome – I’m particularly interested in the 15 ‘mismatched’ N-term-NBARC examples. It would be nice to know if these have partners in the alternative haploid genome, and if the partner has the same mismatch (if not, it would support the proposed domain swapping story)
    I believe each of these concerns will require whole genome alignment of the two haploid genomes.
    

    Additional comments (by line where indicated) The authors introduce the idea that M. quinquenervia is invasive in Florida, but this thread is never followed up on in the discussion and makes it feel a bit awkward. It would help if the authors clarified how the genome could help with management in native and invasive ranges

    Could the authors add some context for why ONT data was included and how it was used?

    It would be helpful if the authors provided a weblink to the iTOL tree

    164-166 – The observation of inversions potentially caused by assembly errors is nice!

    206 – add reference: Bayer PE, Edwards D, Batley J (2018) Bias in resistance gene prediction due to repeat masking. Nat Plants 4: 762–765. pmid:30287950

    240-246 – I’m not sure about excluding these incomplete NLRs – it would be interesting and potentially informative to see where they cluster (do they cluster with an NLR from the alternative haplotype? If so it may indicate truncation of one copy, etc) – however, if the author’s wish to remove these at this step I think they can add a statement like “we were interested in full-length NLRs, the filtered incomplete NLRs may represent….”

    429-430 – The criteria used to define clusters is described in the methods, can you confirm (and mention) that this is the same as used in the analyses you’re comparing to for E. grandis, rice, and Arabidopsis.

    435-437 – I’m interested to know if the four heterogenous clusters contain any of the N-term domain-swapped NLRs

    479-480 – The zf-BED domain is also present in rice NLRs – include citation for Xa1/Xo1

    523-524 – can you specify which base-call model was used on the ONT data?

    I’m curious about the presence/absence of IDs in the analyzed NLRs and would be very curious to know if the authors observe syntenic homologs across the two haploid genomes with ID presence/absence or presence of different IDs polymorphisms.