A highly contiguous, scaffold-level nuclear genome assembly for the fever tree (Cinchona pubescens Vahl) as a novel resource for Rubiaceae research

This article has been Reviewed by the following groups

Read the full article

Abstract

The Andean fever tree (Cinchona L.; Rubiaceae) is a source of bioactive quinine alkaloids used to treat malaria. C. pubescens Vahl is a valuable cash crop within its native range in northwestern South America, however, genomic resources are lacking. Here we provide the first highly contiguous and annotated nuclear and plastid genome assemblies using Oxford Nanopore PromethION-derived long-read and Illumina short-read data. Our nuclear genome assembly comprises 603 scaffolds with a total length of 904 Mbp (∼82% of the full genome based on a genome size of 1.1 Gbp/1C). Using a combination of de novo and reference-based transcriptome assemblies we annotated 72,305 coding sequences comprising 83% of the BUSCO gene set and 4.6% fragmented sequences. Using additional plastid and nuclear datasets we place C. pubescens in the Gentianales order. This first genomic resource for C. pubescens opens new research avenues, including the analysis of alkaloid biosynthesis in the fever tree. 

Article activity feed

  1. Abstract

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.71), and has published the reviews under the same license. These are as follows.

    Reviewer 1. John Hamilton

    Are all data available and do they match the descriptions in the paper?

    Yes. Downloaded and checked from the Gigabyte FTP site

    Are the data and metadata consistent with relevant minimum information or reporting standards?

    Yes. I was unable to check sequence data deposited in the SRA.

    Is the data acquisition clear, complete and methodologically sound?

    Yes, but summary tables are missing for the Illumina WGS and RNA-Seq sequencing in the manuscript.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?

    No. Some parts of the manuscript are very good in this respect and some parts (esp, annotation) are missing parameters and core details.

    Is there sufficient data validation and statistical analyses of data quality?

    No. Especially the analyzing of the quality /completeness of the genome annotation.

    Is the validation suitable for this type of data?

    Yes. Where it is not missing, it is suitable.

    Additional Comments:

    In this manuscript, Canales et al. present the long-read based assembly and annotation of the genome of fever tree (Cinchona pubescens), well known as the source of quinine alkaloids traditionally used to treat malaria. This will be a genome of interest and welcome resource for the community. I enjoyed reading this manuscript about this interesting species and I have several comments: 1. There is not a summary table for the Illumina WGS and the three RNA-Seq libraries. This should be added. 2. Since you have Illumina WGS short reads, it would be informative to add a Genomescope kmer plot (http://qb.cshl.edu/genomescope/) as an additional estimate of genome size and heterozygosity to section 1.3 3. The BUSCO metrics for the assembly are lower than expected. I believe this is due the lack of sufficient genome polishing. Refer to the Solanum pennellii genome paper (https://doi.org/10.1105/tpc.17.00521) where they used a similar assembly strategy and discuss the need for adequate polishing (see “Prior to Polishing, Genome Error Rate Is Substantial”). 4. Section 1.6 – It is noted that PASA describes transcript evidence as ESTs which is a legacy from the time it was developed, but then the RNA-seq transcript assemblies are also described as ESTs later in the section which is incorrect and confusing. 5. There is not an assessment of the annotation, just a statement of the number of CDSs predicted. This is an issue as the number of CDSs is far higher than reported in related species. There is not a discussion of repeat masking the genome assembly so I am assuming AUGUSTUS was run on the unmasked assembly with no downstream filtering or refinement. Doing this increases the number of TE-related gene models and annotation artifacts. As this is a data note/data release there should really be at a minimum: a. A table summarizing the annotation in the manuscript b. An analysis to identify models with evidence support c. BUSCO results for the annotation 
    

    Re-review: I’ve read the author’s responses to all the reviewer comments and read the updated manuscript and I am satisfied with the changes made.

    Reviewer 2. Bing Bing Liu

    I was very pleased to read your article on Fever tree's genomes, and I think it is a very valuable foundational work. The assembled genome recovered ~85% (903M or 904M, table1) of the estimated genome size (1.1 Gb/1C) with an N50 = 2802128 bp; 72,305 CDSs were annotated and 83% (or 87.6%, line 207) of BUSCOs were recovered, but there is a lack of clarity around these statistics in the study. And it is necessary to provide the repeat annotations, function annotations and non-coding RNA annotations. Besides, the BUSCOs recovered is no more than 90%, you should give your explanation. 
    

    Minor comments Lines 34-43, you should add the plastid genome results here. Lines 38, check the genome size. Maybe 904M? Lines 41 and 207, you gave two different percentages of BUSCOs, please check. Lines 144,145 and 149, the numbers of reads and bases are non-correspondence, please check, as the read length is 150bp. Lines 172, I doubt about the overall mapping re (7.34%). Lines 198, you should add the description about the genome produced by RACON. Lines 204 and 223, why do you use the different version of BUSCO software? Lines 206-207, why you did not give the result of mapping rio. Lines 243, you should provide the BUSCO result of proteins (CDSs). Lines 257 and 280, why you use different version of MAFFT? Lines 289, check the sentence ‘had a BS of 100%’.