Genome assembly of the milky mangrove Excoecaria agallocha

Curation statements for this article:
  • Curated by GigaByte

    GigaByte logo

    Editors Assessment:

    This work is part of a series of papers from the Hong Kong Biodiversity Genomics Consortium sequencing the rich biodiversity of species in Hong Kong. This example assembles the genome of the milky mangrove Excoecaria agallocha, also known as blind-your-eye mangrove due to its toxic properties of its milky latex that can cause blindness when it comes into contact with the eyes. Living in the brackish water of tropical mangrove forests from India to Australia, they are an extremely important habitat for a diverse variety of aquatic species, including the mangrove jewel bug of which this tree is the sole food source for the larvae. Using PacBio HiFi long-reads and Omni-C technology a 1,332.45 Mb genome was assembled, with 1,402 scaffolds and a scaffold N50 of 58.95 Mb. After feedback the annotations were improved, predicting a very high number (73,740) protein coding genes. The data presented here provides a valuable resource for further investigation in the biosynthesis of phytochemical compounds in its milky latex with the potential of many medicinal and pharmacological properties. As well as increasing the understanding of biology and evolution in genome architecture in the Euphorbiaceae family and mangrove species adapted to high levels of salinity.

    This evaluation refers to version 1 of the preprint

This article has been Reviewed by the following groups

Read the full article

Abstract

The milky mangrove Excoecaria agallocha is a latex-secreting mangrove that are distributed in tropical and subtropical regions. While its poisonous latex is regarded as a potential source of phytochemicals for biomedical applications, the genomic resources of E. agallocha remains limited. Here, we present a chromosomal level genome of E. agallocha , assembled from the combination of PacBio long-read sequencing and Omni-C data. The resulting assembly size is 1,332.45 Mb and has high contiguity and completeness with a scaffold N50 of 58.9 Mb and a BUSCO score of 98.4 %. 73,740 protein-coding genes were also predicted. The milky mangrove genome provides a useful resource for further understanding the biosynthesis of phytochemical compounds in E. agallocha .

Article activity feed

  1. Editors Assessment:

    This work is part of a series of papers from the Hong Kong Biodiversity Genomics Consortium sequencing the rich biodiversity of species in Hong Kong. This example assembles the genome of the milky mangrove Excoecaria agallocha, also known as blind-your-eye mangrove due to its toxic properties of its milky latex that can cause blindness when it comes into contact with the eyes. Living in the brackish water of tropical mangrove forests from India to Australia, they are an extremely important habitat for a diverse variety of aquatic species, including the mangrove jewel bug of which this tree is the sole food source for the larvae. Using PacBio HiFi long-reads and Omni-C technology a 1,332.45 Mb genome was assembled, with 1,402 scaffolds and a scaffold N50 of 58.95 Mb. After feedback the annotations were improved, predicting a very high number (73,740) protein coding genes. The data presented here provides a valuable resource for further investigation in the biosynthesis of phytochemical compounds in its milky latex with the potential of many medicinal and pharmacological properties. As well as increasing the understanding of biology and evolution in genome architecture in the Euphorbiaceae family and mangrove species adapted to high levels of salinity.

    This evaluation refers to version 1 of the preprint

  2. AbstractThe milky mangrove Excoecaria agallocha is a latex-secreting mangrove that are distributed in tropical and subtropical regions. While its poisonous latex is regarded as a potential source of phytochemicals for biomedical applications, the genomic resources of E. agallocha remains limited. Here, we present a chromosomal level genome of E. agallocha, assembled from the combination of PacBio long-read sequencing and Omni-C data. The resulting assembly size is 1,332.45 Mb and has high contiguity and completeness with a scaffold N50 of 58.9 Mb and a BUSCO score of 98.4 %. 73,740 protein-coding genes were also predicted. The milky mangrove genome provides a useful resource for further understanding the biosynthesis of phytochemical compounds in E. agallocha.

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.119), and has published the reviews under the same license. This is part of a thematic series presenting Data Releases from the Hong Kong Biodiversity Genomics consortium (https://doi.org/10.46471/GIGABYTE_SERIES_0006). These are as follows.

    Reviewer 1. Minghui Kang

    Is the data acquisition clear, complete and methodologically sound?

    The sample collection site needs to include latitude and longitude data.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?

    Please add the software version number to all the software mentioned in the manuscript. Additionally, if the software uses default parameters, please provide the corresponding description. If specific parameters are used, please indicate the corresponding parameters

    Additional Comments: This study presents the assembly of an Excoecaria agallocha genome using PacBio HiFi and Omni-C technologies. The assembly exhibits good contiguity and completeness, providing a valuable resource for further understanding the phylogenetic position, evolutionary history, and natural product biosynthesis in Excoecaria agallocha. However, there are still some issues that need to be addressed and modified, including the following points: L82 It would be preferable to mention the number of chromosomes and the anchor rate of the chromosome-scale assembly here, as well as the estimated genome size based on K-mer analysis, to further support the accuracy and completeness of the assembly. L88 I think the authors need to rearrange the order of the figures, as it is not appropriate for Fig. 1F to appear before Fig. 1A. Please check the results part and arrange the pictures in a reasonable order. L117 The sample collection site needs to include latitude and longitude data. L187 Please add the software version number to all the software mentioned in the manuscript. Additionally, if the software uses default parameters, please provide the corresponding description. If specific parameters are used, please indicate the corresponding parameters. L219 The pseudochromosome scaffolding rate of 86.08% appears to be somewhat low (<90%). The sequences that were not scaffolded onto chromosomes could be a result of untrimmed redundancy in the genome assembly or could indicate some assembly errors. L220 Please note that in this instance, Fig. 1C appears before Fig. 1B in the text. I kindly request the author to review and adjust the numbering and arrangement of figures throughout the entire manuscript. L223 The quality of gene annotation appears to be significantly lower than the quality of genome assembly (82.1%/98.4%), indicating poor gene annotation accuracy. Please review the accuracy of the HMM model trained by the Augustus software or consider using a more accurate annotation workflow. L225 Unclassified repetitive sequences account for over 50% of the total repetitive sequences, which can significantly impact subsequent analyses relying on repetitive sequences. It is recommended to use alternative software, such as The Extensive de novo TE Annotator (EDTA), which provides more accurate classification and utilizes a more comprehensive repetitive sequence library, to validate these results.

    Reviewer 2. Dr.Jarkko Salojarvi

    Is the language of sufficient quality? Yes. Are all data available and do they match the descriptions in the paper? Yes Are the data and metadata consistent with relevant minimum information or reporting standards? Yes Is the data acquisition clear, complete and methodologically sound? Yes Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes Is there sufficient data validation and statistical analyses of data quality? Yes Is the validation suitable for this type of data? Yes Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes