A draft genome assembly of the eastern banjo frog Limnodynastes dumerilii dumerilii (Anura: Limnodynastidae)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Amphibian genomes are usually challenging to assemble due to their large genome size and high repeat content. The Limnodynastidae is a family of frogs native to Australia, Tasmania and New Guinea. As an anuran lineage that successfully diversified on the Australian continent, it represents an important lineage in the amphibian tree of life but lacks reference genomes. Here we sequenced and annotated the genome of the eastern banjo frog Limnodynastes dumerilii dumerilii to fill this gap. The total length of the genome assembly is 2.38 Gb with a scaffold N50 of 285.9 kb. We identified 1.21 Gb of non-redundant sequences as repetitive elements and annotated 24,548 protein-coding genes in the assembly. BUSCO assessment indicated that more than 94% of the expected vertebrate genes were present in the genome assembly and the gene set. We anticipate that this annotated genome assembly will advance the future study of anuran phylogeny and amphibian genome evolution.

Article activity feed

  1. Now published in Gigabyte doi: 10.46471/gigabyte.2

    Qiye Li 1BGI-Shenzhen, Shenzhen 518083, China2State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Qiye LiQunfei Guo 1BGI-Shenzhen, Shenzhen 518083, China3College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteYang Zhou 1BGI-Shenzhen, Shenzhen 518083, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Yang ZhouHuishuang Tan 1BGI-Shenzhen, Shenzhen 518083, China4Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 611731, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteTerry Bertozzi 5South Australian Museum, North Terrace, Adelaide 5000, Australia6School of Biological Sciences, University of Adelaide, North Terrace, Adelaide 5005, AustraliaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Terry BertozziYuanzhen Zhu 1BGI-Shenzhen, Shenzhen 518083, China7School of Basic Medicine, Qingdao University, Qingdao 266071, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteJi Li 2State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China8China National Genebank, BGI-Shenzhen, Shenzhen 518120, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteStephen Donnellan 5South Australian Museum, North Terrace, Adelaide 5000, AustraliaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Stephen DonnellanGuojie Zhang 2State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China8China National Genebank, BGI-Shenzhen, Shenzhen 518120, China9Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, 650223, Kunming, China10Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, DenmarkFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Guojie ZhangFor correspondence: guojie.zhang@bio.ku.dk

    This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    **Review 1. Walter Wolfsberger ** Is the language of sufficient quality? Yes.

    Is the data all available and does it match the descriptions in the paper? Yes.

    Is the data and metadata consistent with relevant minimum information or reporting standards?

    Comment: The accession number for GigaDB provided in the paper does not yield any results in the GigaDB search. Using the species name works though.

    Is the data acquisition clear, complete and methodologically sound?

    Comment: Although it is clear in the paper that a significant portion of data was discarded during the early QC step, there is no indication of the reason for it, or the nature of the problem that was encountered. For total in the paper, the research group produced 396 Gb of raw sequence(211 Short insert and 185 long insert libraries) out of which only 180(130 Gb Short insert and never mentioned 55Gb Long insert) were used later on for the assembly. Upon a single library FastQC analysis I have encountered extreme levels of sequence duplication that might indicate the libraries were not diverse or there was a PCR-artifact(like overamplification), that might have lead to this low-quality initial data. The parameters for tool SoapNuke, used in early QC are not defined.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?

    Is there sufficient data validation and statistical analyses of data quality? Yes.

    Is the validation suitable for this type of data?

    Comments: The assembly followed a logical order, with appropriate tools used at every step.

    Is there sufficient information for others to reuse this dataset or integrate it with other data?

    Comment: Although the resulting assembly was of moderate quality(highly fragmented, but good BUSCO score), a randomly picked library showed a really high duplication rates for sequencing, which indicates that there might be problems for future data reuse. Addressing these issues or at least acknowledging them would benefit the whole report and the dateset.

    Additional Comments:

    I don't think physical coverage is used widely in genome assembly as of now, as given the mate-pair reads nature - it inflates this statistics. I would put the resulting assembly statistics in a table, including all of the metrics(N50, N of Contigs, N of Scaffolds, Average Contig length and etc.) adding BUSCO score to the table, as the current formatting is not readable.

    **Review 2. Nandita Mullapudi ** Is the language of sufficient quality? Yes.

    Is the data all available and does it match the descriptions in the paper? Yes.

    Is the data and metadata consistent with relevant minimum information or reporting standards?

    Comment: I am unaware of defined reporting standards for assembly reports, however, all sample preparation, data generation and analysis methods have been described in adequate amount of detail.

    Is the data acquisition clear, complete and methodologically sound? Yes.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?

    Comment: Following additional details would help to enable reproduction: (1) Parameters used for data pre-processing using SOAPnuke, as well as related adapter sequences etc. These would be necessary to reproduce the data clean up step. (2) Memory, processor and time details of computational resource used for assembly (3) Was Platanus assembly attempted using different parameters, how were the parameters reported in the paper arrived at? (4) For gene prediction, several vertebrate sequences were used, the details/source of these reference sequences are missing.

    Is there sufficient data validation and statistical analyses of data quality?

    Comments: 1) One approach to validating an assembly would be to use more than one assembly tool and compare the results. (This may or may not be within the scope of this study.) 2) With respect to the validation performed by mapping back paired end reads to the assembly, there is no discussion of the ~14% of paired end reads that did not map back in the expected orientation. Would tools like REAPR (https://www.sanger.ac.uk/science/tools/reapr) or SEQuel (https://bix.ucsd.edu/SEQuel/man.html) be appropriate to address this? (given the high level of heterozygosity in L.* d. dumerilii *as reported here).

    Is the validation suitable for this type of data? Yes.

    Is there sufficient information for others to reuse this dataset or integrate it with other data?

    Comments: It may also be helpful to make available the set of cleaned reads, to enable reproduction of the assembly pipeline.