Chromosomal-level genome assembly of golden birdwing Troides aeacus (Felder & Felder, 1860)

Curation statements for this article:
  • Curated by GigaByte

    GigaByte logo

    Editors Assessment:

    This work is part of a series of papers from the Hong Kong Biodiversity Genomics Consortium sequencing the rich biodiversity of species in Hong Kong. This example presents the genome of the golden birdwing butterfly Troides aeacus (Lepidoptera, Papilionidae). A notable and popular species in Asia that faces habitat loss due to urbanization and human activities. The lack of genomic resources impedes conservation efforts based on genetic markers, as well as better understanding of its biology. Using PacBio HiFi long reads and Omni-C a 351Mb genome was assembled genome anchored to 30 pseudo-molecules. After reviewers requested more information on the genome quality it seems there was high sequence continuity with contig length N50 = 11.67 Mb and L50 = 14, and scaffold length N50 = 12.2 Mb and L50 = 13. Allowing a total of 24,946 protein-coding genes were predicted. This study presents the first chromosomal-level genome assembly of the golden birdwing T. aeacus, a potentially useful resource for further phylogenomic studies of birdwing butterfly species in terms of species diversification and conservation. This evaluation refers to version 1 of the preprint

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Troides aeacus , the golden birdwing (Lepidoptera, Papilionidae) is a large swallowtail butterfly widely distributed in Asia. Despite its occurrence, T. aeacus has been assigned as a major protective species in many places given the loss of their native habitats under urbanisation and anthropogenic activities. Nevertheless, the lack of its genomic resources hinders our understanding of their biology, diversity, as well as carrying out conservation measures based on genetic information or markers. Here, we report the first chromosomal-level genome assembly of T. aeacus using a combination of PacBio SMRT and Omni-C scaffolding technologies. The assembled genome (351 Mb) contains 98.94% of the sequences anchored to 30 pseudo-molecules. The genome assembly also has high sequence continuity with scaffold length N50 = 12.2 Mb. A total of 28,749 protein-coding genes were predicted, and high BUSCO score completeness (98.9% of BUSCO metazoa_odb10 genes) was also revealed. This high-quality genome offers a new and significant resource for understanding the swallowtail butterfly biology, as well as carrying out conservation measures of this ecologically important lepidopteran species.

Article activity feed

  1. Editors Assessment:

    This work is part of a series of papers from the Hong Kong Biodiversity Genomics Consortium sequencing the rich biodiversity of species in Hong Kong. This example presents the genome of the golden birdwing butterfly Troides aeacus (Lepidoptera, Papilionidae). A notable and popular species in Asia that faces habitat loss due to urbanization and human activities. The lack of genomic resources impedes conservation efforts based on genetic markers, as well as better understanding of its biology. Using PacBio HiFi long reads and Omni-C a 351Mb genome was assembled genome anchored to 30 pseudo-molecules. After reviewers requested more information on the genome quality it seems there was high sequence continuity with contig length N50 = 11.67 Mb and L50 = 14, and scaffold length N50 = 12.2 Mb and L50 = 13. Allowing a total of 24,946 protein-coding genes were predicted. This study presents the first chromosomal-level genome assembly of the golden birdwing T. aeacus, a potentially useful resource for further phylogenomic studies of birdwing butterfly species in terms of species diversification and conservation. This evaluation refers to version 1 of the preprint

  2. AbstractTroides aeacus, the golden birdwing (Lepidoptera, Papilionidae) is a large swallowtail butterfly widely distributed in Asia. Despite its occurrence, T. aeacus has been assigned as a major protective species in many places given the loss of their native habitats under urbanisation and anthropogenic activities. Nevertheless, the lack of its genomic resources hinders our understanding of their biology, diversity, as well as carrying out conservation measures based on genetic information or markers. Here, we report the first chromosomal-level genome assembly of T. aeacus using a combination of PacBio SMRT and Omni-C scaffolding technologies. The assembled genome (351 Mb) contains 98.94% of the sequences anchored to 30 pseudo-molecules. The genome assembly also has high sequence continuity with scaffold length N50 = 12.2 Mb. A total of 28,749 protein-coding genes were predicted, and high BUSCO score completeness (98.9% of BUSCO metazoa_odb10 genes) was also revealed. This high-quality genome offers a new and significant resource for understanding the swallowtail butterfly biology, as well as carrying out conservation measures of this ecologically important lepidopteran species.

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.122), and has published the reviews under the same license. This is part of a thematic series presenting Data Releases from the Hong Kong Biodiversity Genomics consortium (https://doi.org/10.46471/GIGABYTE_SERIES_0006). These are as follows.

    Reviewer 1. Dr. Kumar Saurabh Singh

    Are the data and metadata consistent with relevant minimum information or reporting standards?

    No.

    1. I've noticed that the genome assembly file has been uploaded to NCBI, but I couldn't locate the corresponding annotation files in GFF format. Additionally, I couldn't find gene models for Troides aeacus on NCBI or any other platform. As per Giga Science data policy, these files should be made publicly available.
    2. The paper lacks information on the contig N50 and L50, although I did find this data on NCBI. Is there a specific reason for omitting the contig N50/L50 details from the main text or tables?

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?

    Yes.

    1. I have noticed that the QV value is missing for the given assembly. To assess the base-level accuracy of your assembly, the authors should calculate the consensus quality (QV), comparing the frequency of k-mers present in the raw Omni-C reads (as you only have short-reads from Omni-c) with those present across the final assembly perhaps using Merqury.
    2. Incorporating Omni-c data did not result in a significant increase in the contig N50. Have you identified any specific reasons for this outcome?
    3. The overall BUSCO completeness for proteins appears to be disproportionately low (~86%) compared to genomic completeness (~98%). Could this be attributed to the absence of RNAseq data for predicting accurate gene models?

    Is there sufficient data validation and statistical analyses of data quality?

    I believe it's essential to assess the assembly quality through comparative genomic analyses, a component seemingly missing from the manuscript. While the text mentions the availability of genomic resources within the same genus, conducting a genome-wide comparison of these assemblies could provide valuable insights into the overall synteny and contiguity of the T. aeacus assembly. To ensure annotation consistency, it's important to compare genome assemblies by generating distributions of intron/exon lengths for annotations across multiple assemblies.

    Reviewer 2. Dr.Xueyan Li

    Link to review: https://gigabyte-review.rivervalleytechnologies.comdownload-api-file?ZmlsZV9wYXRoPXVwbG9hZHMvZ3gvRFIvNDk1L0dpZ2FieXRlRFJSLTIwMjQwMS0wMS1jb21tZW50cy5kb2N4

    Re-review: The paper has substantially been enhanced after the first revision. I suggest that this manuscript can be published after the following minor revisions: 1.L279: ‘formosanus’ is also part of the scientific name which should be Italic type. 2.It’s recommended to beautify the figures and tables.