Improvements in the Sequencing and Assembly of Plant Genomes
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaByte)
Abstract
Background
Advances in DNA sequencing have reduced the difficulty of sequencing and assembling plant genomes. A range of methods for long read sequencing and assembly have been recently compared and we now extend the earlier study and report a comparison with more recent methods.
Results
Updated Oxford Nanopore Technology software supported improved assemblies. The use of more accurate sequences produced by repeated sequencing of the same molecule (PacBio HiFi) resulted in much less fragmented assembly of sequencing reads. The use of more data to give increased genome coverage resulted in longer contigs (higher N50) but reduced the total length of the assemblies and improved genome completeness (BUSCO). The original model species, Macadamia jansenii , a basal eudicot, was also compared with the 3 other Macadamia species and with avocado ( Persea americana ), a magnoliid, and jojoba ( Simmondsia chinensis ) a core eudicot. In these phylogenetically diverse angiosperms, increasing sequence data volumes also caused a highly linear increase in contig size, decreased assembly length and further improved already high completeness. Differences in genome size and sequence complexity apparently influenced the success of assembly from these different species.
Conclusions
Advances in long read sequencing technology have continued to significantly improve the results of sequencing and assembly of plant genomes. However, results were consistently improved by greater genome coverage (using an increased number of reads) with the amount needed to achieve a particular level of assembly being species dependant.
Article activity feed
-
Improvements in the Sequencing and Assembly of Plant Genomes
This manuscript is an Update to a paper published in *GigaScience *in December 2020. See https://doi.org/10.1093/gigascience/giaa146
-
Background
This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Chao Bian
Is the language of sufficient quality? No
Are all data available and do they match the descriptions in the paper? Yes
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide Yes
Is the data acquisition clear, complete and methodologically sound? Yes
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Is there sufficient data validation and statistical analyses of data quality? Yes
Is the validation suitable for this type of data? Yes
Is there sufficient information for others to reuse this dataset or …
Background
This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Chao Bian
Is the language of sufficient quality? No
Are all data available and do they match the descriptions in the paper? Yes
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide Yes
Is the data acquisition clear, complete and methodologically sound? Yes
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Is there sufficient data validation and statistical analyses of data quality? Yes
Is the validation suitable for this type of data? Yes
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
-
Abstract
This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Mile Šikić
Is the language of sufficient quality? Yes
Are all data available and do they match the descriptions in the paper? Yes
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples (http://gigadb.org/site/guide) Yes
Is the data acquisition clear, complete and methodologically sound? Yes
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Is there sufficient data validation and statistical analyses of data quality? Yes
Is the validation suitable for this type of data? Yes
Is there sufficient information for others to reuse this dataset or …
Abstract
This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Mile Šikić
Is the language of sufficient quality? Yes
Are all data available and do they match the descriptions in the paper? Yes
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples (http://gigadb.org/site/guide) Yes
Is the data acquisition clear, complete and methodologically sound? Yes
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Is there sufficient data validation and statistical analyses of data quality? Yes
Is the validation suitable for this type of data? Yes
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Additional Comments: In their update to the previous study on the comparison of long read technologies for sequencing and assembly of plant genomes, Sharma et al. presented a follow-up analysis using a newer generation of base callers for nanopore reads and PacBio HiFi reads. I argue that this study is an important update, but it is not suitable for publication in the current form.
My major comments are the following:
- It is not clear which version of the base caller the authors used in assemblies related to Table 1 and Table 3.
- For phased assemblies, it is important to provide information about the size of alternative contigs
- In Table 1, it would be great to have results for methods that do not phase assembly (i.e. Flye).
- There is no explanation why authors use IPA instead of other HiFi assemblers, i.e. hifiasm, which from my experience, perform better than IPA.
- A sentence related to Table 3, “The quality of the assemblies was more contiguous with less data in each of these cases when HiFi reads were used instead of the earlier continuous long reads (Table 3).” is not clear. Following Table 3, assemblies achieved using long reads have similar or longer N50 and higher BUSCO score. Also, it is not clear which assembler was used for long reads.
-
-