Benchmarking long-read genome assemblers for three sequencing protocols and three agricultural species

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Today, long read technologies make it possible to produce telomere-to-telomere genome assemblies. These assemblies permit more accurate whole genome analyses, including on repeated regions.

The genome assembly process can chain several steps such as read correction, contigs assembly, contig polishing, scaffolding and gap filling, among others. Each step usually requires at least one software package, and input sequences. For the end user, it is not necessarily clear which combination of tools, together with read types, will work best for a given genome.

In this work, we will focus on contig production, which is a central and complex task of the assembly process. It aims at producing the longest, errorless, sequences, called contigs , from reads. While it is possible to produce contigs from short reads, long reads are now widely preferred, since they produce much longer contigs.

In this work, we evaluate several contig producing software packages (usually named assemblers ), on long reads generated by two sequencers using three protocols, for three eukaryotic, complex species with different characteristics. Our aim is twofold. First, we would like to give readers insight on the impact of sequencing technology and assembler combinations, in order to help them make their choice for a given genome. Second, we would like to present different assembly metrics and provide a critical view on their interpretation.

Article activity feed