Sequencing smart: De novo sequencing and assembly approaches for a non-model mammal

Abstract

Background

Whilst much sequencing effort has focused on key mammalian model organisms such as mouse and human, little is known about the relationship between genome sequencing techniques for non-model mammals and genome assembly quality. This is especially relevant to non-model mammals, where the samples to be sequenced are often degraded and of low quality. A key aspect when planning a genome project is the choice of sequencing data to generate. This decision is driven by several factors, including the biological questions being asked, the quality of DNA available, and the availability of funds. Cutting-edge sequencing technologies now make it possible to achieve highly contiguous, chromosome-level genome assemblies, but rely on high-quality high molecular weight DNA. However, funding is often insufficient for many independent research groups to use these techniques. Here we use a range of different genomic technologies generated from a roadkill European polecat (Mustela putorius) to assess various assembly techniques on this low-quality sample. We evaluated different approaches for de novo assemblies and discuss their value in relation to biological analyses.

Results

Generally, assemblies containing more data types achieved better scores in our ranking system. However, when accounting for misassemblies, this was not always the case for Bionano and low-coverage 10x Genomics (for scaffolding only). We also find that the extra cost associated with combining multiple data types is not necessarily associated with better genome assemblies.

Conclusions

The high degree of variability between each de novo assembly method (assessed from the 7 key metrics) highlights the importance of carefully devising the sequencing strategy to be able to carry out the desired analysis. Adding more data to genome assemblies does not always result in better assemblies, so it is important to understand the nuances of genomic data integration explained here, in order to obtain cost-effective value for money when sequencing genomes.

Article activity feed

Now published in GigaScience doi: 10.1093/gigascience/giaa045

Graham J Etherington The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Graham J EtheringtonFor correspondence: graham.etherington@earlham.ac.uk Federica.dipalma@earlham.ac.ukDarren Heavens The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Darren HeavensDavid Baker The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteAshleigh Lister The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Ashleigh ListerRose McNelly The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteGonzalo Garcia The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Gonzalo GarciaBernardo Clavijo The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Bernardo ClavijoIain Macaulay The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Iain MacaulayWilfried Haerty The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Wilfried HaertyFederica Di Palma The Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United KingdomFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Federica Di PalmaFor correspondence: graham.etherington@earlham.ac.uk Federica.dipalma@earlham.ac.uk

A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giaa045 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

These peer reviews were as follows:

Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102226 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102227

Read the original source

Manjushri kalpande
Apoorva Ganesh
Subhashini Srinivasan

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

Grazia Visci
Elisabetta Notario
Giuseppe Defazio
Mariano Francesco Caratozzolo
Bruno Fosso
Marinella Marzano
Graziano Pesole

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

Diptarup Mallick

Sequencing smart: De novo sequencing and assembly approaches for a non-model mammal

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

A Benchmarking Framework to Catalyze Individual Human Genome Projects

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

Related articles

A Benchmarking Framework to Catalyze Individual Human Genome Projects

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

Understanding Pathways in Bioinformatics, Genomics, and Health Applications