Improvements in the sequencing and assembly of plant genomes

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Advances in DNA sequencing have made it easier to sequence and assemble plant genomes. Here, we extend an earlier study, and compare recent methods for long read sequencing and assembly. Updated Oxford Nanopore Technology software improved assemblies. Using more accurate sequences produced by repeated sequencing of the same molecule (Pacific Biosciences HiFi) resulted in less fragmented assembly of sequencing reads. Using data for increased genome coverage resulted in longer contigs, but reduced total assembly length and improved genome completeness. The original model species, Macadamia jansenii, was also compared with three other Macadamia species, as well as avocado (Persea americana) and jojoba (Simmondsia chinensis). In these angiosperms, increasing sequence data volumes caused a linear increase in contig size, decreased assembly length and further improved already high completeness. Differences in genome size and sequence complexity influenced the success of assembly. Advances in long read sequencing technology continue to improve plant genome sequencing and assembly. However, results were improved by greater genome coverage, with the amount needed to achieve a particular level of assembly being species dependent.

Article activity feed

  1. Background

    This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Chao Bian

    1. Is the language of sufficient quality? No

    2. Are all data available and do they match the descriptions in the paper? Yes

    3. Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide Yes

    4. Is the data acquisition clear, complete and methodologically sound? Yes

    5. Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes

    6. Is there sufficient data validation and statistical analyses of data quality? Yes

    7. Is the validation suitable for this type of data? Yes

    8. Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes

  2. Abstract

    This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Mile Šikić

    1. Is the language of sufficient quality? Yes

    2. Are all data available and do they match the descriptions in the paper? Yes

    3. Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples (http://gigadb.org/site/guide) Yes

    4. Is the data acquisition clear, complete and methodologically sound? Yes

    5. Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes

    6. Is there sufficient data validation and statistical analyses of data quality? Yes

    7. Is the validation suitable for this type of data? Yes

    8. Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes

    Additional Comments: In their update to the previous study on the comparison of long read technologies for sequencing and assembly of plant genomes, Sharma et al. presented a follow-up analysis using a newer generation of base callers for nanopore reads and PacBio HiFi reads. I argue that this study is an important update, but it is not suitable for publication in the current form.

    My major comments are the following:

    1. It is not clear which version of the base caller the authors used in assemblies related to Table 1 and Table 3.
    2. For phased assemblies, it is important to provide information about the size of alternative contigs
    3. In Table 1, it would be great to have results for methods that do not phase assembly (i.e. Flye).
    4. There is no explanation why authors use IPA instead of other HiFi assemblers, i.e. hifiasm, which from my experience, perform better than IPA.
    5. A sentence related to Table 3, “The quality of the assemblies was more contiguous with less data in each of these cases when HiFi reads were used instead of the earlier continuous long reads (Table 3).” is not clear. Following Table 3, assemblies achieved using long reads have similar or longer N50 and higher BUSCO score. Also, it is not clear which assembler was used for long reads.