Efficiently constructing complete genomes with CycloneSEQ to fill gaps in bacterial draft assemblies
Curation statements for this article:-
Curated by GigaByte
Editors Assessment:
With the recent official launch of BGI’s new CycloneSEQ sequencing platform that delivers long-reads using novel nanpores, this paper presents benchmarking data and validation studies comparing short, long-rea data from other platforms and hybrid assemblies. This study tests the performance of the new platform in sequencing diverse microbial genomes, presenting raw and processed data to enable others to scrutinise and verify the work. Being openly peer-reviewed, and having scripts and protocols also shared for the first time helps provide transparency in this benchmarking process to increase trust in this new technology. On top of benchmarking typed strains, the technology also was tested with complex microbial communities. Yielding complete metagenome-assembled genomes (MAGs) which were not achieved by short- or long-read assemblies alone. By directly reading DNA molecules without fragmentation, the study demonstrating CycloneSEQ delivers long-read data with impressive length and accuracy, unlocking gaps that short-read technologies alone cannot bridge. Future work is expanding to real samples, with and fine-tuning the balance between short-read and long-read data for even faster, higher-quality assemblies.
This evaluation refers to version 1 of the preprint
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaByte)
- Endorsed by GigaByte (scotted400)
Abstract
Current microbial sequencing relies on short-read platforms like Illumina and DNBSEQ, which are cost-effective and accurate but often produce fragmented draft genomes. Here, we used CycloneSEQ for long-read sequencing of ATCC BAA-835, producing long-reads with an average length of 11.6 kbp and an average quality score of 14.4. Hybrid assembly with short-reads data resulted in an error rate of only 0.04 mismatches and 0.08 indels per 100 kbp compared to the reference genome. This method, validated across nine species, successfully assembled complete circular genomes. Hybrid assembly significantly enhances genome completeness by using long-reads to fill gaps and accurately assembling multi-copy rRNA genes, unlike short-reads alone. Data subsampling showed that combining over 500 Mbp of short-read data with 100 Mbp of long-read data yields high-quality circular assemblies. CycloneSEQ long-reads improves the assembly of circular complete genomes from mixed microbial communities; however, its base quality needs improving. Integrating DNBSEQ short-reads improved accuracy, resulting in complete and accurate assemblies.
Article activity feed
-
Editors Assessment:
With the recent official launch of BGI’s new CycloneSEQ sequencing platform that delivers long-reads using novel nanpores, this paper presents benchmarking data and validation studies comparing short, long-rea data from other platforms and hybrid assemblies. This study tests the performance of the new platform in sequencing diverse microbial genomes, presenting raw and processed data to enable others to scrutinise and verify the work. Being openly peer-reviewed, and having scripts and protocols also shared for the first time helps provide transparency in this benchmarking process to increase trust in this new technology. On top of benchmarking typed strains, the technology also was tested with complex microbial communities. Yielding complete metagenome-assembled genomes (MAGs) which were not achieved by short- or …
Editors Assessment:
With the recent official launch of BGI’s new CycloneSEQ sequencing platform that delivers long-reads using novel nanpores, this paper presents benchmarking data and validation studies comparing short, long-rea data from other platforms and hybrid assemblies. This study tests the performance of the new platform in sequencing diverse microbial genomes, presenting raw and processed data to enable others to scrutinise and verify the work. Being openly peer-reviewed, and having scripts and protocols also shared for the first time helps provide transparency in this benchmarking process to increase trust in this new technology. On top of benchmarking typed strains, the technology also was tested with complex microbial communities. Yielding complete metagenome-assembled genomes (MAGs) which were not achieved by short- or long-read assemblies alone. By directly reading DNA molecules without fragmentation, the study demonstrating CycloneSEQ delivers long-read data with impressive length and accuracy, unlocking gaps that short-read technologies alone cannot bridge. Future work is expanding to real samples, with and fine-tuning the balance between short-read and long-read data for even faster, higher-quality assemblies.
This evaluation refers to version 1 of the preprint
-
Competing Interest StatementThe CycloneSEQ was developed by BGI-Research and will be marketed as an advanced technology. All the authors are employees of BGI-Research and may potentially benefit from it.
See also the Ryan Wick Blog reviewing the preprint: https://rrwick.github.io/2024/12/17/cycloneseq.html
-
AbstractBackground Current microbial sequencing relies on short-read platforms like Illumina and DNBSEQ, favored for their low cost and high accuracy. However, these methods often produce fragmented draft genomes, hindering comprehensive bacterial function analysis. CycloneSEQ, a novel long-read sequencing platform developed by BGI-Research, its sequencing performance and assembly improvements has been evaluated.Findings Using CycloneSEQ long-read sequencing, the type strain produced long reads with an average length of 11.6 kbp and an average quality score of 14.4. After hybrid assembly with short reads data, the assembled genome exhibited an error rate of only 0.04 mismatches and 0.08 indels per 100 kbp compared to the reference genome. This method was validated across 9 diverse species, successfully assembling complete circular …
AbstractBackground Current microbial sequencing relies on short-read platforms like Illumina and DNBSEQ, favored for their low cost and high accuracy. However, these methods often produce fragmented draft genomes, hindering comprehensive bacterial function analysis. CycloneSEQ, a novel long-read sequencing platform developed by BGI-Research, its sequencing performance and assembly improvements has been evaluated.Findings Using CycloneSEQ long-read sequencing, the type strain produced long reads with an average length of 11.6 kbp and an average quality score of 14.4. After hybrid assembly with short reads data, the assembled genome exhibited an error rate of only 0.04 mismatches and 0.08 indels per 100 kbp compared to the reference genome. This method was validated across 9 diverse species, successfully assembling complete circular genomes. Hybrid assembly significantly enhances genome completeness by using long reads to fill gaps and accurately assemble multi-copy rRNA genes, which unable be achieved by short reads solely. Through data subsampling, we found that over 500 Mbp of short-read data combined with 100 Mbp of long-read data can result in a high-quality circular assembly. Additionally, using CycloneSEQ long reads effectively improves the assembly of circular complete genomes from mixed microbial communities.Conclusions CycloneSEQ’s read length is sufficient for circular bacterial genomes, but its base quality needs improvement. Integrating DNBSEQ short reads improved accuracy, resulting in complete and accurate assemblies. This efficient approach can be widely applied in microbial sequencing.
This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.154), and has published the reviews under the same license.
Reviewer 1. Ryan Wick
This manuscript introduces CycloneSEQ data as a means for producing complete bacterial genome assemblies, with a focus on hybrid assemblies made using a combination of CycloneSEQ data and DNBSEQ data. It also publicly provides deep CycloneSEQ+DNBSEQ read sets for a range of bacterial species.
Major comments
The reads for the project were made publicly available via CNGBdb (https://db.cngb.org/search/project/CNP0006129), but I found it to be unusably slow (both the HTTP website and the FTP data downloads). To ensure the data is accessible to a wide audience, I request that it also be hosted in another location to make it available to readers. For example, SRA, ENA or GigaDB.
The paper makes no mention of the other major long-read platforms: Oxford Nanopore Technologies and Pacific Biosciences. Given the widespread use of these platforms (especially ONT) in bacterial genome assembly, some discussion on CycloneSEQ’s relative advantages or limitations would be beneficial.
Minor comments
Lines 100-103: this sentence (‘The GC content was sensitively affected…’) is not clear to me. How are the completeness and accuracy of the assembly affecting GC content?
Figure S2 unnecessarily includes reference-vs-reference difference counts, which are by definition zero.
Figure S2 could mention the genome (Akkermansia muciniphila ATCC BAA-835) in the caption – I did not immediately understand what 'for type strain' meant.
I found Figure 5 difficult to read, with its use of colour to indicate accuracy. This data would be better shown using another visualisation (e.g. bar plot) that more clearly shows quantitative values.
For the mixed microbial community analysis, it should be stated that Unicycler is exclusively designed for bacterial isolates (its documentation explicitly says to not use it on metagenomes).
Some of the supplementary figures are erroneously labelled 'Supplementary Table'.
Some stats on the metagenomic reads would be helpful: e.g. total bp for short and long reads, N50 for long reads, etc.
The methods describe using seqtk, but the reference for this (#25) is SeqKit (a different tool), so either the tool in the methods or the reference is wrong. Re-review: Thank you for the revisions to the manuscript. While many of my minor comments have been addressed, I still have concerns regarding my major comments, which have not been fully resolved.
First, I appreciate that the data has now been made available on NCBI. However, the long-read datasets are labelled as Oxford Nanopore MinION data, which is misleading (example: SRR31850034). I understand this may be because SRA does not yet provide CycloneSEQ as a platform option, but this can be clarified through additional metadata. Specifically, the ‘design’ field for each SRA entry simply says ‘genome’, but it could have more detail, including that these are CycloneSEQ reads. The BioProject (PRJNA1194773) description could also include a clear statement that the long-read data is generated using CycloneSEQ.
Second, I had requested a brief discussion of existing long-read platforms (ONT and PacBio) to provide context on where CycloneSEQ fits into the broader sequencing landscape. The authors have chosen not to include this, stating that they do not have direct comparison data. While I understand that such a comparison is not the purpose of this paper, I still believe that some mention of these platforms is necessary in the Background and/or Discussion sections. This paper introduces a new long-read technology for bacterial genome assembly, and readers will naturally want to understand how it relates to widely used alternatives.
Finally, regarding my comment about supplementary figure labels, I still see the issue in the revised version provided for review. For example, the caption for Supplementary Figure S3 begins with ‘Supplementary Table S3.’ The authors stated that there were no errors, but this mislabelling remains in the PDF I received.
As these concerns remain unresolved, I do not consider the manuscript acceptable in its current form.
Reviewer 2. Keith Robison
As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?
N/A - no software presented (relates to other software questions)
Additional comments: This is a useful presentation of an emerging sequencing platform.
Given the complex nature of nanopore signals and the difficulty of decoding them, it has been a pattern with the prior nanopore platform that improvements in basecalling software have yielded significant changes in basecalling performance. Therefore, it would be highly advantageous if the manuscript listed which specific versions / revision numbers of the basecalling software were used so that these results are properly contextualized for comparison to future results which may use newer basecalling software.
Ideally, the publication would include a link to git (or similar) repository with the complete pipeline used to generate the results
-
-