Sequenoscope: A Modular Tool for Nanopore Adaptive Sequencing Analytics and Beyond
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents Sequenoscope: a bioinformatics pipeline for analyzing Oxford Nanopore Technologies (ONT) adaptive sampling sequencing data. Sequenoscope features three main modules: filter_ONT for filtering raw reads and creating a FASTQ file with a subset of reads for further analyses, analyze for generating sequencing and read mapping statistics against the provided reference taxon sequences, and plot for interactive data summarization, comparison, and visualization between two datasets/test conditions. Here we demonstrate the ability of the pipeline to analyze ONT adaptive sampling sequence data and provide examples of the outputs users can expect using data we generated. Adaptive sampling was performed on two ZymoBIOMICS Microbial Community DNA Standards, log-distributed (Cat# D6311) and even-distributed (Cat# D6306) formulations, with targeted depletions of Listeria monocytogenes. By comparing the test and control experimental data in FASTQ files from the sequencing runs, Sequenoscope showed that depletion of L. monocytogenes was successful by providing users with parameters to compare such as taxon coverage, read length, and types of pore-level decisions made during sequencing. Although Sequenoscope was designed for ONT adaptive sampling data analysis, it supports short-read data from other sequencing platforms such as Illumina, allowing for the direct comparison of any two experimental conditions or cross-platform benchmarking.
Article activity feed
-
-
Dear Authors, the reviewers have now considered your revised manuscript and note that you have addressed their comments, resulting in a substantially improved submission. Only minor issues remain, primarily relating to clarity, completeness of reporting, and small textual or figure-related corrections. Please note that reviewer #2 contributed to reviewer #1's review. On this basis, the manuscript is suitable for publication pending minor revisions. I encourage you to address these remaining points and I look forward to receiving your revised manuscript in due course.
-
Comments to Author
We commend the authors for their thorough and constructive response to the reviewers' comments and for the substantial improvements made to the manuscript. Many of the previously raised issues have been adequately addressed. Importantly, aspects that could not yet be resolved have been explicitly acknowledged and deferred to future versions of the tool. This transparency is appreciated, as it demonstrates careful consideration by the authors and allows readers to better assess the current scope and limitations of the work. The provided installation options via Nextflow and conda were tested, functioned as expected, and are user-friendly, which significantly enhances the reusability of the tool. We have a small number of remaining minor comments: 1. The manuscript now appears to include results …
Comments to Author
We commend the authors for their thorough and constructive response to the reviewers' comments and for the substantial improvements made to the manuscript. Many of the previously raised issues have been adequately addressed. Importantly, aspects that could not yet be resolved have been explicitly acknowledged and deferred to future versions of the tool. This transparency is appreciated, as it demonstrates careful consideration by the authors and allows readers to better assess the current scope and limitations of the work. The provided installation options via Nextflow and conda were tested, functioned as expected, and are user-friendly, which significantly enhances the reusability of the tool. We have a small number of remaining minor comments: 1. The manuscript now appears to include results generated using an Illumina dataset. However, these results are not mentioned in the Results and Discussion section. As the use of Illumina data is described in the Materials and Methods, the corresponding results should be at least briefly reported in the main text, even if the majority of the analysis is presented in the Supplementary Materials. 2. Line 111: The sentence "For the results presented in this manuscript, we used the following tool versions enclosed in parenthesis." is incomplete. No tool versions are actually listed, and the phrase "enclosed in parenthesis" is vague and grammatically incorrect. The sentence should either be followed immediately by the list of tool versions or removed altogether. If the intention was merely to state that version numbers are provided throughout the manuscript, this is unnecessary, as reporting software versions is standard practice. 3. Line 292: There is a typographical error in the sentence "With the references selected, ,the analyze module …", where an extra comma should be removed. 4. Figures 2 and 3: The resolution of these figures is too low to clearly read the axis labels and legend text. Axis titles, tick labels, and legend text should be increased in size to ensure readability. 5. For the Plotly software reference, consider avoiding the abbreviation of "Plotly Technologies Inc." The abbreviated form appears unusual, and software citations should follow a consistent and conventional format. Advice for future versions: As the pipeline already calculates the number of sequenced bases per species, it would be straightforward to also compute fold enrichment values. This metric would provide a concise and quantitative measure of adaptive sampling efficiency and would strengthen the evaluation of the method.
Please rate the manuscript for methodological rigour
Satisfactory
Please rate the quality of the presentation and structure of the manuscript
Satisfactory
To what extent are the conclusions supported by the data?
Strongly support
Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?
No
Is there a potential financial or other conflict of interest between yourself and the author(s)?
No
If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?
Yes
-
Comments to Author
I contributed to Wim Cuypers' review.
Please rate the manuscript for methodological rigour
Good
Please rate the quality of the presentation and structure of the manuscript
Good
To what extent are the conclusions supported by the data?
Strongly support
Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?
No
Is there a potential financial or other conflict of interest between yourself and the author(s)?
No
If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?
Yes
-
-
Comments to Author
I contributed to Wim Cuypers' review.
Please rate the manuscript for methodological rigour
Satisfactory
Please rate the quality of the presentation and structure of the manuscript
Good
To what extent are the conclusions supported by the data?
Partially support
Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?
No
Is there a potential financial or other conflict of interest between yourself and the author(s)?
No
If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?
Yes
-
Comments to Author
General comments ---------------------- Sequenoscope enables the comparison of two sets of basecalled ONT (Oxford Nanopore Technologies) sequencing data: one generated using regular sequencing, and one with adaptive sampling. The primary aim is to lower the barrier for adoption of ONT adaptive sampling by offering a straightforward framework to compare both sequencing strategies. This is a very nice concept, and the tool has potential to be a valuable contribution for researchers working with adaptive sampling and nanopore sequencing, especially for optimising these processes. At present, a clear and formalised way to compare adaptive and regular sequencing is lacking, and this tool could fill that gap. However, several issues with the tool's utility, documentation and implementation currently limit …
Comments to Author
General comments ---------------------- Sequenoscope enables the comparison of two sets of basecalled ONT (Oxford Nanopore Technologies) sequencing data: one generated using regular sequencing, and one with adaptive sampling. The primary aim is to lower the barrier for adoption of ONT adaptive sampling by offering a straightforward framework to compare both sequencing strategies. This is a very nice concept, and the tool has potential to be a valuable contribution for researchers working with adaptive sampling and nanopore sequencing, especially for optimising these processes. At present, a clear and formalised way to compare adaptive and regular sequencing is lacking, and this tool could fill that gap. However, several issues with the tool's utility, documentation and implementation currently limit its usability. Addressing these would significantly improve the scientific quality of the manuscript as well as the user experience and the impact of the tool. --- Specific comments ---------------------- Line 41: "Raw sequence data has been deposited under NCBI BioProject PRJNA1051081" - The term 'raw sequence' data is ambiguous in the context of nanopore sequencing. It can also refer to pre-basecalled signal data (fast5/pod5 files). It would be helpful to clarify that this refers to basecalled reads in FASTQ format. Line 69: - Consider replacing "on target" with on-target for consistency with common terminology. Line 140 onwards: - Software versions are not reported. If the pipeline does not use fixed versions, please specify somewhere in the methods which versions were used for the analysis. - Why use *fastp*? While widely used for Illumina reads, *fastp* may not be optimal for long-read data. Consider using *fastplong* (https://github.com/OpenGene/fastplong), which is designed for nanopore reads. Line 141: "minimap2 (Li, 2018) (default parameters) - Although you mention using "default parameters," this isn't entirely accurate. You're actually using parameters appropriate for long nanopore reads (e.g. `-ax map-ont`). It would be helpful to clarify this in the text. - Additionally, could you explain why a k-mer size of 15 was chosen for indexing the reference? Was this based on a specific consideration or benchmarking? A brief justification would improve transparency. Line 145 - Estimated genome size and coverage: - It is unclear how MASH is used here. If it's only to estimate reference genome size, why not use the exact sequence length from the reference itself? Also, if coverage is calculated from mapped reads (e.g. via BAM files), why would you still need to run MASH on the FASTQ files? From the code, it seems MASH is run on trimmed reads (fastp output), but the manuscript refers to reference genomes. Clarifying the workflow and logic here would improve reproducibility and understanding. - Furthermore, Table 2, Table 3, supplementary tables S1-S12, contain redundant or misleading information: *estimated genome size*, *estimated coverage*, *total bases*, *total bases after trimming*, and *mean read length* are identical across all rows. This risks misinterpretation and diminishes the utility of the tables. Please check and revise. Line 173 - Section 6.4.1: - Consider including a reference to an example plot. This would help readers connect the narrative with a visual result. Lines 226-228: "The large eukaryotic genomes of Cryptococcus neoformans and Saccharomyces cerevisiae were excluded from the reference FASTA." - This may result in false-positive mappings to closely related regions (see https://www.nature.com/articles/s41564-025-02035-2). Please clarify the rationale and potential consequences. Lines 186-202: "The ONT sequencing summary file logs pore-level 'end_decision' values, which map to three main outcomes: [...] 1. 'Stop_receiving' [...] 2. 'Unblocked' [...] 3. 'No_decision' [...]" - This description is a bit confusing, particularly when compared to the definitions provided in De Vries et al. ONT's adaptive sampling output (`adaptive_sampling_summary.txt`) includes a `decision` column with three possible values: `stop_receiving`, `unblock`, and `no_decision`. As written, the three main outcomes listed in lines 188-204 seem to correspond directly to these adaptive sampling decisions. However, this appears to be an oversimplification. The De Vries et al. paper—cited here—states: "The AS outcome can be derived from the sequencing summary that provides the end reason for every read [...] Reads that are rejected in the AS process are labeled as 'data_service_unblock_mux_change,' whereas reads that are completely sequenced without intervention are classified as 'signal_positive.' These may be either 'no_decision' or accepted ('stop_receiving') reads."" This means that `signal_positive` reads may originate from *both* the `stop_receiving` and `no_decision` categories. From what I understand, Sequenoscope currently assigns all `signal_positive` reads to the `stop_receiving` group. If this is correct, it could lead to misclassification and misinterpretation of adaptive sampling outcomes, particularly in figures 4 and 5. I suggest clarifying in the manuscript how these categories are defined and mapped from the ONT metadata. If `signal_positive` reads are used as a proxy for accepted reads, it would be helpful to explicitly acknowledge the limitations of this approach—especially the fact that `signal_positive` includes reads for which no adaptive sampling decision was made. Providing additional detail in the methods section and clarifying this in figure legends or supplementary material would help avoid confusion and ensure that users interpret the results correctly. Line 268: "this taxon" or "this species" - consider clarifying for precision. Line 269: "Under two percent homology…" - This explanation seems speculative. Adaptive sampling rejection would result in characteristic read lengths (~400-500 bp), due to the time needed for a decision. Without knowing input fragment sizes per species, it is hard to conclude that low homology alone explains this observation. Could this instead be a stochastic effect? You're comparing a highly abundant species to a low-abundance one. When adaptive sampling rejects reads from the abundant species, more pores may be available to sequence other fragments. This could allow more molecules—particularly shorter ones, which are sequenced faster—to pass through, leading to a shift in the read length distribution. Some discussion of this alternative explanation would be helpful. In fact, you see this pattern repeated: Enterococcus faecalis shows shorter average read lengths in the adaptive sampling condition compared to regular sequencing. However, even in regular sequencing, Enterococcus has the shortest average read length. This suggests that the fragment size of *Enterococcus* may have been shorter prior to library prep—possibly due to DNA quality or shearing. This might be worth checking with the manufacturer (Zymo) or available documentation. Line 324: - How is 'coverage' defined here? Is it the same '1X coverage' as mentioned before? If so, reusing that term will help avoid confusion between coverage and read depth. - In general, it would be helpful to chech the consistent use of the terms 'coverage', 'coverage depth' (L115), and '1X coverage' etc. trhoughout the manuscript and define these terms. --- References ------------- - Line 482: The reference to Vries et al. is outdated. It has since been published (https://rnajournal.cshlp.org/content/29/12/1939). - Line 479: Viehweger et al. has also been published: https://gigabytejournal.com/articles/75 --- Software and installation ---------------------------- - Conda installation: The conda installation command does not work out-of-the-box due to dependency issues. Installation required several attempts and manual fixes. This should be addressed. - PyPI: Currently installs an outdated version. Please update. - Nextflow pipeline: - The `nf-sequenoscope` repository lacks basic usage instructions. Consider adding installation steps and an example run command, such as: ```bash git clone https://github.com/phac-nml/nf-sequenoscope nextflow run nf-sequenoscope/sequenoscope_analyze.nf -h ``` - Ideally, provide a streamlined wrapper pipeline that includes all modules (filter, analyze, plot). - Based on the current structure, it seems like `workflow` blocks are defined inside `process` blocks. This is not valid in DSL2 and will not work. Please revise the pipeline structure accordingly. - Once the conda installation is stable, you could use the `conda` directive in the Nextflow pipeline for reproducibility. - I attempted to run the `filter_ONT` tool using the command below, based on the documentation and mock data provided: ```bash nextflow run nf-sequenoscope/sequenoscope_filter_ONT.nf \ --input_fastq mock_data/mock.fastq \ --input_summary mock_data/mock_sequencing_summary.txt \ -o mock_filter_ONT -min_ch 1 -max_ch 256 ``` This failed with an "Unknown option: -min_ch" error. This may be due to single dash flags conflicting with Nextflow's syntax. It underlines the importance of including working examples and clarifying parameter usage in the documentation. --- Suggestions for improvement of the manuscript and tool ---------------------------------------------------------------- - Real-time analysis: Using Nextflow's `watchPath` function could make the pipeline real-time. This might be useful for researchers monitoring an adaptive sampling run. - Fold-enrichment calculation: Consider including a metric for fold-enrichment of target taxa (based on *on-target bases*, not read counts). You could check previous work on this (https://journals.asm.org/doi/full/10.1128/mbio.01967-23). - Pipeline structure: There are currently three separate pipeline sections. If there is no strong reason for this, consider merging them into a single workflow that processes data from input to plots in one go. This paper was co-reviewed with PhD student Ms Laura Raes.
Please rate the manuscript for methodological rigour
Satisfactory
Please rate the quality of the presentation and structure of the manuscript
Good
To what extent are the conclusions supported by the data?
Partially support
Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?
No
Is there a potential financial or other conflict of interest between yourself and the author(s)?
No
If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?
Yes
-
Dear authors, thank you for submitting your manuscript to Access Microbiology. It has now been reviewed by three experts in the field, whose comments are attached at the bottom of this email. The Sequenoscope tool has been noted as timely and likely to be of interest to the wider scientific community. However, in its current format, it has been noted that the there are limitations around its utility. I welcome the authors to address these concerns as well as all other reviewer comments that will strengthen this manuscript towards publication. Please note that reviewers #1 and #2 submitted a joint review.
-
Comments to Author
In this manuscript, Sequenoscope presents a specialized bioinformatics pipeline for analyzing Oxford Nanopore Technologies (ONT) adaptive sequencing data, addressing a gap in tools for evaluating targeted enrichment/depletion experiments. The modular Python-based tool features three core components: 1) filter_ONT for read subsetting based on flow-cell channels or quality metrics; 2) analyze for mapping reads to reference genomes and generating coverage statistics; and 3) plot for comparative visualization between experimental conditions using interactive Plotly graphs. Validation through Listeria monocytogenesdepletion in ZymoBIOMICS microbial standards demonstrated compelling functionality. While adaptive sampling reduced target coverage by 63-67%, decision-tracking plots of Sequenoscope …
Comments to Author
In this manuscript, Sequenoscope presents a specialized bioinformatics pipeline for analyzing Oxford Nanopore Technologies (ONT) adaptive sequencing data, addressing a gap in tools for evaluating targeted enrichment/depletion experiments. The modular Python-based tool features three core components: 1) filter_ONT for read subsetting based on flow-cell channels or quality metrics; 2) analyze for mapping reads to reference genomes and generating coverage statistics; and 3) plot for comparative visualization between experimental conditions using interactive Plotly graphs. Validation through Listeria monocytogenesdepletion in ZymoBIOMICS microbial standards demonstrated compelling functionality. While adaptive sampling reduced target coverage by 63-67%, decision-tracking plots of Sequenoscope revealed significantly elevated "unblocked" read events in test conditions. The pipeline's strengths include its visualization capabilities, particularly the comparative coverage plots that toggle between linear/log scales to highlight depletion efficacy across taxa abundance levels, and the decision-tracking bar charts that quantify adaptive sampling dynamics over time. Further, the practical design enabling channel-splitting on a single flow cell minimizes technical variability, while outputs spanning per-read details to taxonomic summaries support granular diagnostics. Notably, the tool inadvertently revealed homology-driven off-target depletion of Enterococcus faecalis, underscoring its analytical sensitivity. However, limitations include incomplete validation of claimed cross-platform compatibility (e.g., Illumina data), methodological gaps such as unexplained exclusion of eukaryotic references, and reproducibility concerns from absent version details for dependencies and missing supplemental figures. The methodology section's excessive technicality also obscures key innovations. In summary, Sequenoscope delivers a functionally solution for a niche in the analysis of ONT adaptive sequencing data. The pipeline's modular architecture and interactive visualization capabilities demonstrate considerable promise for evaluating targeted enrichment/depletion experiments. While the manuscript is sound, I recommend the following points to strengthen methodological rigor and broaden applicability. Below are specific revisions required for publication. 1)The claimed compatibility with non-ONT platforms (e.g., Illumina short-read data) requires empirical validation. Incorporating comparative analysis of hybrid datasets would substantiate the pipeline's versatility beyond ONT-specific workflows. Additionally, testing Sequenoscope on clinical samples undergoing host-DNA depletion would significantly reinforce its diagnostic utility. Such validation would demonstrate real-world applicability in scenarios where host contamination overwhelms pathogen signals (a critical use case explicitly mentioned in the Introduction but not experimentally addressed). 2)The technical description of the pipeline's operation should be condensed to emphasize biological and analytical innovations rather than replicating software documentation. Crucially, the unintended depletion of E. faecalis (attributed to 1.96% homology with L. monocytogenes) necessitates proactive mitigation strategies. The authors should discuss how reference database optimization or k-mer threshold adjustments could minimize such off-target effects in future implementations. This would transform an observed limitation into a strength by providing actionable solutions to users. 3)Full reproducibility demands explicit versioning of all dependencies (e.g., Minimap2, MASH, fastp) used in benchmarking 4)The Discussion should explicitly contrast Sequenoscope with existing tools, clarifying how its three-module design offers unique advantages for adaptive sampling quality control. Acknowledgment of methodological limitations is also essential. For example, the exclusion of eukaryotic references (Cryptococcus neoformansand Saccharomyces cerevisiae) requires justification, while homology-driven depletion artifacts warrant discussion as inherent challenges in adaptive sampling workflows. These additions would provide a more balanced perspective on the pipeline's scope.
Please rate the manuscript for methodological rigour
Good
Please rate the quality of the presentation and structure of the manuscript
Good
To what extent are the conclusions supported by the data?
Strongly support
Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?
No
Is there a potential financial or other conflict of interest between yourself and the author(s)?
No
If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?
Yes
-
