Biodiversity genomics of small metazoans: high quality de novo genomes from single specimens of field-collected and ethanol-preserved springtails

This article has been Reviewed by the following groups

Read the full article

Abstract

Genome sequencing of all known eukaryotes on Earth promises unprecedented advances in evolutionary sciences, ecology, systematics and in biodiversity-related applied fields such as environmental management and natural product research. Advances in DNA sequencing technologies make genome sequencing feasible for many non-genetic model species. However, genome sequencing today relies on large quantities of high quality, high molecular weight (HMW) DNA which is mostly obtained from fresh tissues. This is problematic for biodiversity genomics of Metazoa as most species are small and yield minute amounts of DNA. Furthermore, briging living specimens to the lab bench not realistic for the majority of species.

Here we overcome those difficulties by sequencing two species of springtails (Collembola) from single specimens preserved in ethanol. We used a newly developed, genome-wide amplification-based protocol to generate PacBio libraries for HiFi long-read sequencing.

The assembled genomes were highly continuous. They can be considered complete as we recovered over 95% of BUSCOs. Genome-wide amplification does not seem to bias genome recovery. Presence of almost complete copies of the mitochondrial genome in the nuclear genome were pitfalls for automatic assemblers. The genomes fit well into an existing phylogeny of springtails. A neotype is designated for one of the species, blending genome sequencing and creation of taxonomic references.

Our study shows that it is possible to obtain high quality genomes from small, field-preserved sub-millimeter metazoans, thus making their vast diversity accessible to the fields of genomics.

Article activity feed

  1. Genome sequencing

    Reviewer 2: Mahul Chakraborty

    Reviewer Comments to Author: In "Two high-quality de novo genomes from single ethanol-preserved specimens of tiny metazoans (Collembola)." Schneider et al. described de novo genome assemblies of two tiny field collected Collembolan specimens. The authors collected high quality genomic DNA from the specimens following a Pacfiic Biosciences recommended protocol for ultra low input library, amplified them, and generated adequate sequence coverage to generate contiguous assemblies. This is a significant step forward in generating de novo genome assemblies from small amounts of tissues and cells and therefore will be a useful guide for not only people who are studying whole organisms but also people who are studying variation between cell or tissue types within an individual. I have some minor comments: "They were preserved in 96% ethanol, kept at ambient-temperature for one day until they would be stored at -20°C for 1.5 months, until DNA extraction." - Was the preservation at -20 a deliberate step to see the effect of this treatment on sequencing or just a conscious choice for specimen preservation? The specific conditions used (e.g. the time and speed of centrifuge) for the g-Tube shearing needs to be added in the Methods. "Circularity was validated manually, and nucleotide bases were called with a 75% threshold Consensus.?" - please clarify what the 75% threshold consensus is. "We then performed another estimation of the genome size by dividing the number of mapped nucleotides by mode of the coverage distribution" - Why was this done? Did the authors suspect the Genomescope estimate to be incorrect? "We compared our new genomes sequenced to previous Collembola assemblies that were generated with long read and sometimes additional short read data." - This statement needs citations for the previous Collembola assemblies. The authors used blastn and megablast to search the beta-lactams synthesis genes in the new assembly. Tblastx might be more appropriate. "For D. tigrina a total of 20,22 Gb HiFi data (Q>=20) was generated," - Do you mean 20.22 ? "For S. aquaticus a total of Gb HiFi data (Q>=20) was generated" - missing the number before Gb The authors report only one assembly from hifiasm, which I presume is the primary assembly. Given that the authors assembled diploid individuals, I am curious whether hifiasm assembled the alternate haplotype sequences. "The insect genomes have higher BUSCO scores (96.5 and 99.6%), but lower contiguity (Table 2, Fig. 3)."

    • This statement is incorrect. A number of insect genomes are more contiguous than the assemblies presented here, including Drosophila melanogaster (PMID: 31653862) and several other Drosophila species, Anopheles stephensi (DOI:10.1101/2020.05.24.113019), Anopheles albimanus (PMID: 32883756)
  2. ABSTRACT

    Reviewer 1. Arong Luo

    Reviewer Comments to Author: First, I'd like to commend the authors on attempting to sequence whole genomes of tiny metazoans, which account for a large part of biodiversity in nature and yet are difficult to be sequenced. Second, I am impressed by their ethanol-preserved specimens, which thus make genome sequencing more applicable and attractive in practice. We must admit that sometimes we cannot use fresh specimens directly for genome sequencing. Thus, I think this manuscript is really of scientific significance for specific fields such as insects. I found that the focal part of their sequencing protocol is the "whole genome amplification-based Ultra-Low DNA Input Workflow for SMRT Sequencing (PacBio)" throughout the text, which of course is very complex. So, I suggest the authors provide a flowchart showing critical or main steps during their workflow, and the readers can then understand easily and refer to their workflow in future projects. Finer points: Line 35: I suggest providing specific/important information for the 'novel' protocol herein. Line119-120: Are the specimens later for DNA extraction also morphologically identified? Line130-131: The DNA extract was selected randomly or based on certain measurements? Line 393: delete the dot '.'