Improvements in the sequencing and assembly of plant genomes

Priyanka Sharma
Othman Al-Dossary
Bader Alsubaie
Ibrahim Al-Mssallem
Onkar Nath
Neena Mitter
Gabriel Rodrigues Alves Margarido
Bruce Topp
Valentine Murigneux
Ardashir Kharabian Masouleh
Agnelo Furtado
Robert J. Henry

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (GigaByte)

Abstract

Advances in DNA sequencing have made it easier to sequence and assemble plant genomes. Here, we extend an earlier study, and compare recent methods for long read sequencing and assembly. Updated Oxford Nanopore Technology software improved assemblies. Using more accurate sequences produced by repeated sequencing of the same molecule (Pacific Biosciences HiFi) resulted in less fragmented assembly of sequencing reads. Using data for increased genome coverage resulted in longer contigs, but reduced total assembly length and improved genome completeness. The original model species, Macadamia jansenii, was also compared with three other Macadamia species, as well as avocado (Persea americana) and jojoba (Simmondsia chinensis). In these angiosperms, increasing sequence data volumes caused a linear increase in contig size, decreased assembly length and further improved already high completeness. Differences in genome size and sequence complexity influenced the success of assembly. Advances in long read sequencing technology continue to improve plant genome sequencing and assembly. However, results were improved by greater genome coverage, with the amount needed to achieve a particular level of assembly being species dependent.

GigaByte
Aug 23, 2021

Improvements in the Sequencing and Assembly of Plant Genomes

This manuscript is an Update to a paper published in *GigaScience *in December 2020. See https://doi.org/10.1093/gigascience/giaa146

Read the original source
GigaByte
Aug 23, 2021
Background

This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Chao Bian
1. Is the language of sufficient quality? No
2. Are all data available and do they match the descriptions in the paper? Yes
3. Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide Yes
4. Is the data acquisition clear, complete and methodologically sound? Yes
5. Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
6. Is there sufficient data validation and statistical analyses of data quality? Yes
7. Is the validation suitable for this type of data? Yes
8. Is there sufficient information for others to reuse this dataset or …
Background

This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Chao Bian

Is the language of sufficient quality? No

Are all data available and do they match the descriptions in the paper? Yes

Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide Yes

Is the data acquisition clear, complete and methodologically sound? Yes

Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes

Is there sufficient data validation and statistical analyses of data quality? Yes

Is the validation suitable for this type of data? Yes

Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Read the original source
GigaByte
Aug 23, 2021
Abstract

This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Mile Šikić
1. Is the language of sufficient quality? Yes
2. Are all data available and do they match the descriptions in the paper? Yes
3. Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples (http://gigadb.org/site/guide) Yes
4. Is the data acquisition clear, complete and methodologically sound? Yes
5. Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
6. Is there sufficient data validation and statistical analyses of data quality? Yes
7. Is the validation suitable for this type of data? Yes
8. Is there sufficient information for others to reuse this dataset or …
Abstract

This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Mile Šikić

Is the language of sufficient quality? Yes

Are all data available and do they match the descriptions in the paper? Yes

Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples (http://gigadb.org/site/guide) Yes

Is the data acquisition clear, complete and methodologically sound? Yes

Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes

Is there sufficient data validation and statistical analyses of data quality? Yes

Is the validation suitable for this type of data? Yes

Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes

Additional Comments: In their update to the previous study on the comparison of long read technologies for sequencing and assembly of plant genomes, Sharma et al. presented a follow-up analysis using a newer generation of base callers for nanopore reads and PacBio HiFi reads. I argue that this study is an important update, but it is not suitable for publication in the current form.

My major comments are the following:

It is not clear which version of the base caller the authors used in assemblies related to Table 1 and Table 3.

For phased assemblies, it is important to provide information about the size of alternative contigs

In Table 1, it would be great to have results for methods that do not phase assembly (i.e. Flye).

There is no explanation why authors use IPA instead of other HiFi assemblers, i.e. hifiasm, which from my experience, perform better than IPA.

A sentence related to Table 3, “The quality of the assemblies was more contiguous with less data in each of these cases when HiFi reads were used instead of the earlier continuous long reads (Table 3).” is not clear. Following Table 3, assemblies achieved using long reads have similar or longer N50 and higher BUSCO score. Also, it is not clear which assembler was used for long reads.
Read the original source
Version published to 10.46471/gigabyte.24
Jun 10, 2021
Version published to 10.1101/2021.01.22.427724 on bioRxiv
Jan 22, 2021

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed