cellsnake: a user-friendly tool for single-cell RNA sequencing analysis

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Single-cell RNA sequencing (scRNA-seq) provides high-resolution transcriptome data to understand the heterogeneity of cell populations at the single-cell level. The analysis of scRNA-seq data requires the utilization of numerous computational tools. However, non-expert users usually experience installation issues, a lack of critical functionality or batch analysis modes, and the steep learning curves of existing pipelines.

Results

We have developed cellsnake, a comprehensive, reproducible, and accessible single-cell data analysis workflow, to overcome these problems. Cellsnake offers advanced features for standard users and facilitates downstream analyses in both R and Python environments. It is also designed for easy integration into existing workflows, allowing for rapid analyses of multiple samples.

Conclusion

As an open-source tool, cellsnake is accessible through Bioconda, PyPi, Docker, and GitHub, making it a cost-effective and user-friendly option for researchers. By using cellsnake, researchers can streamline the analysis of scRNA-seq data and gain insights into the complex biology of single cells.

Article activity feed

  1. AbstractBackground Single-cell RNA sequencing (scRNA-seq) provides high-resolution transcriptome data to understand the heterogeneity of cell populations at the single-cell level. The analysis of scRNA-seq data requires the utilization of numerous computational tools. However, non-expert users usually experience installation issues, a lack of critical functionality or batch analysis modes, and the steep learning curves of existing pipelines.Results We have developed cellsnake, a comprehensive, reproducible, and accessible single-cell data analysis workflow, to overcome these problems. Cellsnake offers advanced features for standard users and facilitates downstream analyses in both R and Python environments. It is also designed for easy integration into existing workflows, allowing for rapid analyses of multiple samples.Conclusion As an open-source tool, cellsnake is accessible through Bioconda, PyPi, Docker, and GitHub, making it a cost-effective and user-friendly option for researchers. By using cellsnake, researchers can streamline the analysis of scRNA-seq data and gain insights into the complex biology of single cells.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad091 ), which carries out open, named peer-review. These review is published under a CC-BY 4.0 license:

    **Reviewer name: Qianqian Song **

    This paper offers an open-source tool, i.e., cellsnake, to perform single-cell data analysis. This cellsnake tool offers advanced features for standard users and facilitates downstream analyses in both R and Python environments. It is also designed for easy integration into existing workflows, allowing for rapid analyses of multiple samples. I like the incorporation design of the metagenome analysis in this tool, which makes it different with other available tools in single-cell analysis.

    1. I looked through their tutorial, and have a specific question regarding the resolution parameter. I wonder if this resolution argument needs to be pre-selected? Or the cellsnake tool can automatically select a resolution parameter?

    2. Is it possible to add color legends in the umap? Rather than label all cell types on the umap. It can be very hard to distinguish the cell types, especially when there are many cell types available.

    3. If the single-cell data is profiled from human tissue, is it also possible to use cellsnake to perform microbiome analysis?

    4. I recommend the authors to compare cellsnake with other existing tools. Pros and cons need to be highlighted.

  2. Background Single-cell RNA sequencing (scRNA-seq) provides high-resolution transcriptome data to understand the heterogeneity of cell populations at the single-cell level. The analysis of scRNA-seq data requires the utilization of numerous computational tools. However, non-expert users usually experience installation issues, a lack of critical functionality or batch analysis modes, and the steep learning curves of existing pipelines.Results We have developed cellsnake, a comprehensive, reproducible, and accessible single-cell data analysis workflow, to overcome these problems. Cellsnake offers advanced features for standard users and facilitates downstream analyses in both R and Python environments. It is also designed for easy integration into existing workflows, allowing for rapid analyses of multiple samples.Conclusion As an open-source tool, cellsnake is accessible through Bioconda, PyPi, Docker, and GitHub, making it a cost-effective and user-friendly option for researchers. By using cellsnake, researchers can streamline the analysis of scRNA-seq data and gain insights into the complex biology of single cells.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad091 ), which carries out open, named peer-review. These review is published under a CC-BY 4.0 license:

    Reviewer name: Tazro Ohta

    The manuscript describes Cellsnake, a user-friendly tool for single-cell RNA sequencing analysis that targets non-expert users in the field of bioinformatics. Cellsnake operates as a command-line application, providing offline analysis capabilities for sensitive data. The integration of popular single-cell RNA-seq analysis software within Cellsnake, as described in Table 1, enhanced its utility as a comprehensive workflow. Cellsnake has different execution options (minimal, standard, and advanced) with varying outputs and execution times. The authors have provided well-structured online documentation, including helpful quick-start examples that facilitated easy understanding and usage of Cellsnake.

    The tool was tested using the Docker appliance and the provided fetal brain dataset and performed as expected. The manuscript explains the functions well, with the results reproduced from existing research using publicly available datasets. The following issues need to be addressed by the authors.

    1. The authors should include the citation for the Snakemake paper to acknowledge its contribution. https://doi.org/10.1093/bioinformatics/bts480

    2. To support the claim of unique features in Cellsnake, a comparison with other similar methods, such as that on Galaxy (https://doi.org/10.1093/gigascience/giaa102), should be included.

    3. It is recommended to host the Docker container image on both the GitHub Container Registry and the Docker Hub for better availability and redundancy. The authors should publish the Dockerfile to enable users to build a container image, if needed.

    4. Online documentation is missing a link to the fetal-liver example dataset (https://cellsnake.readthedocs.io/en/latest/fetalliver.html), which needs to be addressed. The fetalbrain dataset shared via Dropbox should also be deposited in the Zenodo repository to improve accessibility and long-term preservation.

    5. To assist users who want to use Cellsnake as a Snakemake workflow, the tool documentation should provide clear instructions on how to run Cellsnake as a single snakemake pipeline. This would be useful for users who utilize existing workflow platforms to accept snakemake requests.

    6. The benchmarking of Cellsnake must provide more precise specifications than simply referring to "a standard laptop" for computing requirements. My trial of "cellsnake integrated standard" with the fetalbrain dataset took more than 17 h via Docker execution on my M1 Max MacBook Pro. This may be because the provided Docker image is AMD-based, which let my MacBook run the container on a VM, but the recommended computational specifications will help users. The GitHub issue of the Cellsnake repository also mentioned that the software is not tested on Windows Conda, which should be mentioned at least in the online documentation.

    7. In the Data Availability section, please ensure that the correct formatting and consistent identifiers are used for public data, such as replacing SRP129388 with PRJNA429950 and E-MTAB-7407 with PRJEB34784, specifying that these IDs are from the Bioproject database. It is important to mention that EGA files are under controlled access, requiring user permission for retrieval.

    8. The references in the manuscript need to be properly formatted to ensure the inclusion of publication years and DOIs where available.

    9. The help message from the Cellsnake command indicates that its default values are set for human samples. The authors should mention in the manuscript that the pipeline is configured for human samples and requires further configuration for use with samples from other organisms. A step-by-step guide to configuring the setting for the other species, including the reference data download, would be helpful in obtaining more audiences.