RiboSnake – a user-friendly, robust, reproducible, multipurpose and documentation-extensive pipeline for 16S rRNA gene microbiome analysis

Curation statements for this article:
  • Curated by GigaByte

    GigaByte logo

    Editors Assessment:

    This new software paper presents RiboSnake, a validated, automated, reproducible analysis pipeline implemented in the popular Snakemake workflow management system for microbiome analysis. Analysing16S rRNA gene amplicon sequencing data, this uses the widely used oQIIME2 [ tool as the basis of the workflow as it offers a wide range of functionality. Users of QIIME2 can be overwhelmed by the number of options at their disposal, and this workflow provides a fully automated and fully reproducible pipeline that can be easily installed and maintained. Providing an easy-to-navigate output accessible to non bioinformatics experts, alongside sets of already validated parameters for different types of samples. Reviewers requested some clarification for testing, worked examples and documentation, and this was improved to produce a convincingly easy-to-use workflow. Hopefully opening up an already very established technique to a new group of users and assisting them with reproducible science.

    This evaluation refers to version 1 of the preprint

This article has been Reviewed by the following groups

Read the full article

Abstract

Background Next-generation sequencing for microbial communities has become a standard technique. However, the computational analysis remains resource-intensive. With declining costs and growing adoption of sequencing-based methods in many fields, validated, fully automated, reproducible and flexible pipelines are increasingly essential in various scientific fields. Results We present RiboSnake, a validated, automated, reproducible QIIME2-based pipeline implemented in Snakemake for analysing 16S rRNA gene amplicon sequencing data. RiboSnake includes pre-packaged validated parameter sets optimized for different sample types, from environmental samples to patient data. The configuration packages can be easily adapted and shared, requiring minimal user input. Conclusion RiboSnake is a new alternative for researchers employing 16S rRNA gene amplicon sequencing and looking for a customizable and user-friendly pipeline for microbiome analyses with in vitro validated settings. By automating the analysis with validated parameters for diverse sample types, RiboSnake enhances existing methods significantly. The workflow repository can be found on GitHub (https://github.com/IKIM-Essen/RiboSnake).

Article activity feed

  1. Editors Assessment:

    This new software paper presents RiboSnake, a validated, automated, reproducible analysis pipeline implemented in the popular Snakemake workflow management system for microbiome analysis. Analysing16S rRNA gene amplicon sequencing data, this uses the widely used oQIIME2 [ tool as the basis of the workflow as it offers a wide range of functionality. Users of QIIME2 can be overwhelmed by the number of options at their disposal, and this workflow provides a fully automated and fully reproducible pipeline that can be easily installed and maintained. Providing an easy-to-navigate output accessible to non bioinformatics experts, alongside sets of already validated parameters for different types of samples. Reviewers requested some clarification for testing, worked examples and documentation, and this was improved to produce a convincingly easy-to-use workflow. Hopefully opening up an already very established technique to a new group of users and assisting them with reproducible science.

    This evaluation refers to version 1 of the preprint

  2. AbstractBackground Next-generation sequencing for assaying microbial communities has become a standard technique in recent years. However, the initial investment required into in-silico analytics is still quite significant, especially for facilities not focused on bioinformatics. With the rapid decline in costs and growing adoption of sequencing-based methods in a number of fields, validated, fully automated, reproducible and yet flexible pipelines will play a greater role in various scientific fields in the future.Results We present RiboSnake, a validated, automated, reproducible QIIME2-based analysis pipeline implemented in Snakemake for the computational analysis of 16S rRNA gene amplicon sequencing data. The pipeline comes with pre-packaged validated parameter sets, optimized for different sample types. The sets range from complex environmental samples to patient data. The configuration packages can be easily adapted and shared, requiring minimal user input.Conclusion RiboSnake is a new alternative for researchers employing 16S rRNA gene amplicon sequencing and looking for a customizable and yet user-friendly pipeline for microbiome analysis with in-vitro validated settings. The complete analysis generated with a fully automated pipeline based on validated parameter sets for different sample types is a significant improvement to existing methods. The workflow repository can be found on GitHub (https://github.com/IKIM-Essen/RiboSnake).

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.132), and has published the reviews under the same license. These are as follows.

    Reviewer 1. Michael Hall

    Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

    Unable to test. The README states "If you want to test the RiboSnake functions yourself, you can use the same data used for the CI/CD tests." A worked example of how I can do this would be appreciated so I can test the workflow.

    Is there enough clear information in the documentation to install, run and test this tool, including information on where to seek help if required?

    The Usage instructions say to create a new repository using ribosnake as a template, but ribosnake is not a template repository (see https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template). The README states "If you want to test the RiboSnake functions yourself, you can use the same data used for the CI/CD tests." A worked example of how I can do this would be appreciated so I can test the workflow.

    Have any claims of performance been sufficiently tested and compared to other commonly-used packages?

    Not applicable.

    Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

    Yes, though as mentioned above, the README states "If you want to test the RiboSnake functions yourself, you can use the same data used for the CI/CD tests." A worked example of how I can do this would be appreciated so I can test the workflow.

    Additional Comments:

    The Introduction could be make far more concise, there's a lot of repetition.

    The installation command in figure 1 is three commands, not two as stated in the text (third-last paragraph Introduction), and is slightly misleading from an installation point of view as it assumes conda and snakemake are installed. Though it is mentioned later in the text (p5) that snakemake and conda require manual installation.

    The in-text citation for Greengenes2 is just [?] - maybe a latex issue?

    The last paragraph of the 'Features and Implementations' section was mostly already stated earlier in the manuscript.

    Make the colouring consistent between fig 2a-c and 2d as well as the vertical ordering to make for easier comparison. For example, in figures 2a-c Enterococcus (grey) is on the bottom, whereas in fig 2d it is red and in the middle. Colour legends should also be added to Figures 3-5 to match Fig 2.

    A small table should be added showing the comparison of RiboSnake and the original publication for the top 10 most abundant phyla for the Atacama soil dataset and their abundances (see last paragraph of 'Usage and Findings'.

    Reviewer 2. Yong-Xin Liu and Salsabeel Yousuf

    The manuscript presented by the authors describes a comprehensive study on the “RiboSnake pipeline” for 16S rRNA gene microbiome analysis, which is a user-friendly, robust, and multipurpose. RiboSnake, a validated, automated, reproducible QIIME2-based analysis pipeline implemented in Snakemake, offers parallel processing for efficient analysis of large datasets in both environmental and medical research contexts. Further demonstrating its effectiveness, this pipeline effectively analyzes human-associated microbiomes and environmental samples like wastewater and soil, thus expanding the scope of analysis for 16S rRNA data. The overall computational pipeline is useful and results are sound, validated through rigorous testing on MOCK communities and real-world datasets. However, there are some issues for improvement in the manuscript.

    Major comments: 1. In the clinical data section the author mentions rectal swabs were used from a published study [31]. While the source is referenced, it would be helpful to know if any information was provided in the referenced study regarding the collection methods or storage conditions for the rectal swabs. 2. The text mentions using cotton swabs pre-moistened with TE buffer + 0.5% Tween 20. While cotton swabs are common, are there any considerations for using different swab materials depending on the target analytes or sampling surface (e.g., flocked swabs for better epithelial cell collection)? 3. Does RiboSnake require user intervention during any steps, or is it fully automated? 4. The author mentions that contamination filtering parameters should be adjusted based on the sample type. How can users determine the appropriate filtering parameters for their specific samples? Are there guidelines for users to know how much adjustment is needed for specific scenarios? 5. The default abundance threshold for filtering low-frequency reads is chosen based on Nearing et al. [44]. Please discuss the rationale behind using a single threshold for all sample types? Would it be beneficial to allow users to define this threshold based on their data characteristics? 6. Would you like to explain the limitation of RiboSnake, such as specific types of samples it may not be suitable for or potential biases introduced by certain functionalities? 7. The manuscript mentions various visualization tools used throughout the pipeline (QIIME2, qurro). Please clarify which types of data are visualized with each tool, and how users can access or customize these visualizations? 8. To strengthen the manuscript's impact, consider discussing the specific novelty of RiboSnake compared to existing 16S rRNA gene microbiome analysis pipelines. Would you be able to elaborate on the unique features or functionalities of RiboSnake that address limitations of current methods? 9. EasyAmplicon is recently published pipeline and easy using in windows, mac and linux system,

    Minor comments:

    1. Reference is missing in this sentence. “The default is the SILVA database [47]. Greengenes2 [? ] can be used alternatively”.
    2. The author should careful about the lowercase and upper case throughout the manuscript. Please check the following for references:  ..the 2017 published Atacama Soil data set with samples taken fromthe Atacama desert was used [32] as well as samples collected fromsoil under switchgrass published in [33].  based on an Euclidean beta diversitymetric, shows that the positive controls, as well as the samples taken from subjects 1 and 3 (S1 and S3), cluster together.  A wide range of diversity analysis parameters are available in QIIME2 and its associated tools. These include the Shannon diversity index to measure richness, the Pielou index tomeasure evenness, or perform standard correlation analysis using Pearson or Spearman indices, among others.
    3. In the introduction part this sentences “However, while these methods enable 16S rRNA analysis with minimal user interaction…” needs attention for clarity. Consider separating it into two sentences to emphasize the limitations of existing pipelines compared to the described methods’. Alternatively, using contrasting words like "in contrast" could highlight these differences.
    4. More detail in attached PDF.

    https://gigabyte-review.rivervalleytechnologies.comdownload-api-file?ZmlsZV9wYXRoPXVwbG9hZHMvZ3gvVFIvNTM5L2d4LVRSLTE3MTY5Nzk4MTktcmV2aXNlZC5wZGY=

    Re-review: The author's response has been fully addressed my concerns. The quality of the paper has apparently improved. I agree with the publication of this article.