A highly contiguous, scaffold-level nuclear genome assembly for the Fever tree ( Cinchona pubescens Vahl) as a novel resource for research in the Rubiaceae

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

The Andean Fever tree ( Cinchona L.; Rubiaceae) is the iconic source of bioactive quinine alkaloids, which have been vital to treating malaria for centuries. C. pubescens Vahl, in particular, has been an essential source of income for several countries within its native range in north-western South America. However, an absence of available genomic resources is essential for placing the Cinchona species within the tree of life and setting the foundation for exploring the evolution and biosynthesis of quinine alkaloids.

Findings

We address this gap by providing the first highly contiguous and annotated nuclear and organelle genome assemblies for C. pubescens . Using a combination of ∼120 Gb of long sequencing reads derived from the Oxford Nanopore PromethION platform and 142 Gb of short-read Illumina data. Our nuclear genome assembly comprises 603 scaffolds comprising a total length of 904 Mb, and the completeness represents ∼85% of the genome size (1.1 Gb/1C). This draft genome sequence was complemented by annotating 72,305 CDSs using a combination of de novo and reference-based transcriptome assemblies. Completeness analysis revealed that our assembly is moderately complete, displaying 83% of the BUSCO gene set and a small fraction of genes (4.6%) classified as fragmented. Additionally, we report C. pubescens plastome with a length of ∼157 Kb and a GC content of 37.74%. We demonstrate the utility of these novel genomic resources by placing C. pubescens in the Gentianales order using additional plastid and nuclear datasets.

Conclusions

Our study provides the first genomic resource for C. pubescens , thus opening new research avenues, including the provision of crucial genetic resources for analysis of alkaloid biosynthesis in the Fever tree.

Article activity feed

  1. Abstract

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.71), and has published the reviews under the same license. These are as follows.

    Reviewer 1. John Hamilton

    Are all data available and do they match the descriptions in the paper?

    Yes. Downloaded and checked from the Gigabyte FTP site

    Are the data and metadata consistent with relevant minimum information or reporting standards?

    Yes. I was unable to check sequence data deposited in the SRA.

    Is the data acquisition clear, complete and methodologically sound?

    Yes, but summary tables are missing for the Illumina WGS and RNA-Seq sequencing in the manuscript.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?

    No. Some parts of the manuscript are very good in this respect and some parts (esp, annotation) are missing parameters and core details.

    Is there sufficient data validation and statistical analyses of data quality?

    No. Especially the analyzing of the quality /completeness of the genome annotation.

    Is the validation suitable for this type of data?

    Yes. Where it is not missing, it is suitable.

    Additional Comments:

    In this manuscript, Canales et al. present the long-read based assembly and annotation of the genome of fever tree (Cinchona pubescens), well known as the source of quinine alkaloids traditionally used to treat malaria. This will be a genome of interest and welcome resource for the community. I enjoyed reading this manuscript about this interesting species and I have several comments: 1. There is not a summary table for the Illumina WGS and the three RNA-Seq libraries. This should be added. 2. Since you have Illumina WGS short reads, it would be informative to add a Genomescope kmer plot (http://qb.cshl.edu/genomescope/) as an additional estimate of genome size and heterozygosity to section 1.3 3. The BUSCO metrics for the assembly are lower than expected. I believe this is due the lack of sufficient genome polishing. Refer to the Solanum pennellii genome paper (https://doi.org/10.1105/tpc.17.00521) where they used a similar assembly strategy and discuss the need for adequate polishing (see “Prior to Polishing, Genome Error Rate Is Substantial”). 4. Section 1.6 – It is noted that PASA describes transcript evidence as ESTs which is a legacy from the time it was developed, but then the RNA-seq transcript assemblies are also described as ESTs later in the section which is incorrect and confusing. 5. There is not an assessment of the annotation, just a statement of the number of CDSs predicted. This is an issue as the number of CDSs is far higher than reported in related species. There is not a discussion of repeat masking the genome assembly so I am assuming AUGUSTUS was run on the unmasked assembly with no downstream filtering or refinement. Doing this increases the number of TE-related gene models and annotation artifacts. As this is a data note/data release there should really be at a minimum: a. A table summarizing the annotation in the manuscript b. An analysis to identify models with evidence support c. BUSCO results for the annotation 
    

    Re-review: I’ve read the author’s responses to all the reviewer comments and read the updated manuscript and I am satisfied with the changes made.

    Reviewer 2. Bing Bing Liu

    I was very pleased to read your article on Fever tree's genomes, and I think it is a very valuable foundational work. The assembled genome recovered ~85% (903M or 904M, table1) of the estimated genome size (1.1 Gb/1C) with an N50 = 2802128 bp; 72,305 CDSs were annotated and 83% (or 87.6%, line 207) of BUSCOs were recovered, but there is a lack of clarity around these statistics in the study. And it is necessary to provide the repeat annotations, function annotations and non-coding RNA annotations. Besides, the BUSCOs recovered is no more than 90%, you should give your explanation. 
    

    Minor comments Lines 34-43, you should add the plastid genome results here. Lines 38, check the genome size. Maybe 904M? Lines 41 and 207, you gave two different percentages of BUSCOs, please check. Lines 144,145 and 149, the numbers of reads and bases are non-correspondence, please check, as the read length is 150bp. Lines 172, I doubt about the overall mapping re (7.34%). Lines 198, you should add the description about the genome produced by RACON. Lines 204 and 223, why do you use the different version of BUSCO software? Lines 206-207, why you did not give the result of mapping rio. Lines 243, you should provide the BUSCO result of proteins (CDSs). Lines 257 and 280, why you use different version of MAFFT? Lines 289, check the sentence ‘had a BS of 100%’.