rCASC: reproducible classification analysis of single-cell sequencing data

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools have been developed, but few computational workflows provide analysis flexibility while also achieving functional (i.e., information about the data and the tools used are saved as metadata) and computational reproducibility (i.e., a real image of the computational environment used to generate the data is stored) through a user-friendly environment.

Findings

rCASC is a modular workflow providing an integrated analysis environment (from count generation to cell subpopulation identification) exploiting Docker containerization to achieve both functional and computational reproducibility in data analysis. Hence, rCASC provides preprocessing tools to remove low-quality cells and/or specific bias, e.g., cell cycle. Subpopulation discovery can instead be achieved using different clustering techniques based on different distance metrics. Cluster quality is then estimated through the new metric "cell stability score" (CSS), which describes the stability of a cell in a cluster as a consequence of a perturbation induced by removing a random set of cells from the cell population. CSS provides better cluster robustness information than the silhouette metric. Moreover, rCASC's tools can identify cluster-specific gene signatures.

Conclusions

rCASC is a modular workflow with new features that could help researchers define cell subpopulations and detect subpopulation-specific markers. It uses Docker for ease of installation and to achieve a computation-reproducible analysis. A Java GUI is provided to welcome users without computational skills in R.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giz105

    Luca Alessandrì 2Department of Molecular Biotechnology and Health Sciences, University of Torino, Via Nizza 52, Torino, ItalyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteMarco Beccuti 1Department of Computer Sciences, University of Torino, Corso Svizzera 185, Torino, ItalyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteMaddalena Arigoni 2Department of Molecular Biotechnology and Health Sciences, University of Torino, Via Nizza 52, Torino, ItalyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteMartina Olivero 3Department of Oncology, University of Torino, SP142, 95, 10060 Candiolo TO, ItalyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteGreta Romano 1Department of Computer Sciences, University of Torino, Corso Svizzera 185, Torino, ItalyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteGennaro De Libero 4Department Biomedizin, University of Basel, Hebelstrasse 20, 4031 Basel, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteLuigia Pace 5IIGM, Via Nizza 52, Torino, ItalyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteFrancesca Cordero 1Department of Computer Sciences, University of Torino, Corso Svizzera 185, Torino, ItalyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteRaffaele A Calogero 2Department of Molecular Biotechnology and Health Sciences, University of Torino, Via Nizza 52, Torino, ItalyFind this author on Google ScholarFind this author on PubMedSearch for this author on this site

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giz105 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.101893 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.101894