The draft nuclear genome assembly of Eucalyptus pauciflora: a pipeline for comparing de novo assemblies

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Eucalyptus pauciflora (the snow gum) is a long-lived tree with high economic and ecological importance. Currently, little genomic information for E. pauciflora is available. Here, we sequentially assemble the genome of Eucalyptus pauciflora with different methods, and combine multiple existing and novel approaches to help to select the best genome assembly.

Findings

We generated high coverage of long- (Nanopore, 174×) and short- (Illumina, 228×) read data from a single E. pauciflora individual and compared assemblies from 5 assemblers (Canu, SMARTdenovo, Flye, Marvel, and MaSuRCA) with different read lengths (1 and 35 kb minimum read length). A key component of our approach is to keep a randomly selected collection of ∼10% of both long and short reads separated from the assemblies to use as a validation set for assessing assemblies. Using this validation set along with a range of existing tools, we compared the assemblies in 8 ways: contig N50, BUSCO scores, LAI (long terminal repeat assembly index) scores, assembly ploidy, base-level error rate, CGAL (computing genome assembly likelihoods) scores, structural variation, and genome sequence similarity. Our result showed that MaSuRCA generated the best assembly, which is 594.87 Mb in size, with a contig N50 of 3.23 Mb, and an estimated error rate of ∼0.006 errors per base.

Conclusions

We report a draft genome of E. pauciflora, which will be a valuable resource for further genomic studies of eucalypts. The approaches for assessing and comparing genomes should help in assessing and choosing among many potential genome assemblies from a single dataset.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giz160

    Weiwen Wang 1Research School of Biology, the Australian National University, Canberra, AustraliaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Weiwen WangFor correspondence: wei.wang@anu.edu.au rob.lanfear@anu.edu.auAshutosh Das 1Research School of Biology, the Australian National University, Canberra, Australia2Department of Genetics and Animal Breeding, Faculty of Veterinary Medicine, Chittagong Veterinary and Animal Sciences University, Chittagong, BangladeshFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Ashutosh DasDavid Kainer 1Research School of Biology, the Australian National University, Canberra, AustraliaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for David KainerMiriam Schalamun 1Research School of Biology, the Australian National University, Canberra, Australia3Institute of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences, Vienna, AustriaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Miriam SchalamunAlejandro Morales-Suarez 4Department of Biological Sciences, Macquarie University, Sydney, AustraliaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Alejandro Morales-SuarezBenjamin Schwessinger 1Research School of Biology, the Australian National University, Canberra, AustraliaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Benjamin Schwessinger

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giz160 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102038 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102039