RNA-SeqEZPZ: A Point-and-Click Pipeline for Comprehensive Transcriptomics Analysis with Interactive Visualizations
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (GigaScience)
Abstract
RNA-Seq analysis has become a routine task in numerous genomic research labs, driven by the reduced cost of bulk RNA sequencing experiments. These generate billions of reads that require accurate, efficient, effective, and reproducible analysis. But the time required for comprehensive analysis remains a bottleneck. Many labs rely on in-house scripts, making standardization and reproducibility challenging. To address this, we developed RNA-SeqEZPZ, an automated pipeline with a user-friendly point-and-click interface, enabling rigorous and reproducible RNA-Seq analysis without requiring programming or bioinformatics expertise. For advanced users, the pipeline can also be executed from the command line, allowing customization of steps to suit specific requirements.
This pipeline includes multiple steps from quality control, alignment, filtering, read counting to differential expression and pathway analysis. We offer two different implementations of the pipeline using either (1) bash and SLURM or (2) Nextflow. The two implementation options allow for straightforward installation, making it easy for individuals familiar with either language to modify and/or run the pipeline across various computing environments.
RNA-SeqEZPZ provides an interactive visualization tool using R shiny to easily select the FASTQ files for analysis and compare differentially expressed genes and their functions across experimental conditions. The tools required by the pipeline are packaged into a Singularity image for ease of installation and to ensure replicability. Finally, the pipeline performs a thorough statistical analysis and provides an option to perform batch adjustment to minimize effects of noise due to technical variations across replicates.
RNA-SeqEZPZ is freely available and can be downloaded from https://github.com/cxtaslim/RNA-SeqEZPZ .
Article activity feed
-
RNA-Seq analysis has become a routine task in numerous genomic research labs, driven by the reduced cost of bulk RNA sequencing experiments. These generate billions of reads that require accurate, efficient, effective, and reproducible analysis. But the time required for comprehensive analysis remains a bottleneck. Many labs rely on in-house scripts, making standardization and reproducibility challenging. To address this, we developed RNA-SeqEZPZ, an automated pipeline with a user-friendly point-and-click interface, enabling rigorous and reproducible RNA-Seq analysis without requiring programming or bioinformatics expertise. For advanced users, the pipeline can also be executed from the command line, allowing customization of steps to suit specific requirements.This pipeline includes multiple steps from quality control, alignment, …
RNA-Seq analysis has become a routine task in numerous genomic research labs, driven by the reduced cost of bulk RNA sequencing experiments. These generate billions of reads that require accurate, efficient, effective, and reproducible analysis. But the time required for comprehensive analysis remains a bottleneck. Many labs rely on in-house scripts, making standardization and reproducibility challenging. To address this, we developed RNA-SeqEZPZ, an automated pipeline with a user-friendly point-and-click interface, enabling rigorous and reproducible RNA-Seq analysis without requiring programming or bioinformatics expertise. For advanced users, the pipeline can also be executed from the command line, allowing customization of steps to suit specific requirements.This pipeline includes multiple steps from quality control, alignment, filtering, read counting to differential expression and pathway analysis. We offer two different implementations of the pipeline using either (1) bash and SLURM or (2) Nextflow. The two implementation options allow for straightforward installation, making it easy for individuals familiar with either language to modify and/or run the pipeline across various computing environments.RNA-SeqEZPZ provides an interactive visualization tool using R shiny to easily select the FASTQ files for analysis and compare differentially expressed genes and their functions across experimental conditions. The tools required by the pipeline are packaged into a Singularity image for ease of installation and to ensure replicability. Finally, the pipeline performs a thorough statistical analysis and provides an option to perform batch adjustment to minimize effects of noise due to technical variations across replicates.RNA-SeqEZPZ is freely available and can be downloaded from https://github.com/cxtaslim/RNA-SeqEZPZ.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf133), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 2: Yang Yang
The manuscript describes RNA-SeqEZPZ, an automated RNA-Seq analysis pipeline with a user-friendly point-and-click interface. It aims to make comprehensive transcriptomics analyses more accessible to researchers who lack extensive bioinformatics skills by addressing common issues with standardization and usability that arise from using in-house scripts. The pipeline's main features are the use of a Singularity container to simplify software installation and a Nextflow version to support scalability across different computing environments like clouds and clusters. However, I'm not sure if this manuscript fits the journal's scope in its current form. It seems to be just an integration of existing tools without offering new methods or findings.
Major comments:
The manuscript mentions several existing RNA-Seq pipelines, such as ENCODE, nf-core, ROGUE, Shiny-Seq, bulkAnalyseR, Partek™ flow, RaNA-Seq, and RASflow. A more detailed comparison of RNA-SeqEZPZ with these tools is needed, especially regarding specific features, performance metrics, and ease of use. For example, it would be helpful to compare the computational resources required by each pipeline or the statistical methods used for differential expression analysis.
The manuscript emphasizes reproducibility through Singularity containers and Nextflow. However, it would be stronger if it included a more rigorous demonstration of reproducibility. This could involve running the pipeline on multiple datasets and comparing the results, or providing a detailed protocol for other researchers to reproduce the findings.
The manuscript highlights the scalability and portability of RNA-SeqEZPZ due to its Nextflow version. It would be useful to include specific examples of how the pipeline has been used in different computing environments (e.g., cloud, cluster) and to provide performance data to demonstrate its scalability.
The point-and-click interface is a key feature, but the manuscript could benefit from a more detailed description of the interface and its functionalities. Including screenshots or a video demonstration would be valuable for potential users.
The manuscript shows the effects of batch adjustment using a public dataset. It would be beneficial to expand this section with a discussion of the limitations of batch adjustment methods and to provide guidance on when and how to apply them.
-
RNA-Seq analysis has become a routine task in numerous genomic research labs, driven by the reduced cost of bulk RNA sequencing experiments. These generate billions of reads that require accurate, efficient, effective, and reproducible analysis. But the time required for comprehensive analysis remains a bottleneck. Many labs rely on in-house scripts, making standardization and reproducibility challenging. To address this, we developed RNA-SeqEZPZ, an automated pipeline with a user-friendly point-and-click interface, enabling rigorous and reproducible RNA-Seq analysis without requiring programming or bioinformatics expertise. For advanced users, the pipeline can also be executed from the command line, allowing customization of steps to suit specific requirements.This pipeline includes multiple steps from quality control, alignment, …
RNA-Seq analysis has become a routine task in numerous genomic research labs, driven by the reduced cost of bulk RNA sequencing experiments. These generate billions of reads that require accurate, efficient, effective, and reproducible analysis. But the time required for comprehensive analysis remains a bottleneck. Many labs rely on in-house scripts, making standardization and reproducibility challenging. To address this, we developed RNA-SeqEZPZ, an automated pipeline with a user-friendly point-and-click interface, enabling rigorous and reproducible RNA-Seq analysis without requiring programming or bioinformatics expertise. For advanced users, the pipeline can also be executed from the command line, allowing customization of steps to suit specific requirements.This pipeline includes multiple steps from quality control, alignment, filtering, read counting to differential expression and pathway analysis. We offer two different implementations of the pipeline using either (1) bash and SLURM or (2) Nextflow. The two implementation options allow for straightforward installation, making it easy for individuals familiar with either language to modify and/or run the pipeline across various computing environments.RNA-SeqEZPZ provides an interactive visualization tool using R shiny to easily select the FASTQ files for analysis and compare differentially expressed genes and their functions across experimental conditions. The tools required by the pipeline are packaged into a Singularity image for ease of installation and to ensure replicability. Finally, the pipeline performs a thorough statistical analysis and provides an option to perform batch adjustment to minimize effects of noise due to technical variations across replicates.RNA-SeqEZPZ is freely available and can be downloaded from https://github.com/cxtaslim/RNA-SeqEZPZ.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf133), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 1: Unitsa Sangket
This research presents a well-designed and powerful program for comprehensive transcriptomics analysis with interactive visualizations. The tool is conceptually strong and user-friendly, requiring only raw reads in FASTQ format to initiate the analysis, with no need for manual quality checks. However, a limitation is that the software must be installed manually, which typically requires access to a high-performance computing (HPC) system and support from a system administrator for installation and server maintenance. As such, non-technical users may find it difficult to install and operate the program independently.
With appropriate revisions based on the comments below, the manuscript has the potential to be significantly improved.
Page 8, line 158-160 "DESeq2 was selected based on findings by Rapaport et al. (2013)40, which demonstrated its superior specificity and sensitivity as well as good control of false positive errors." The findings in the paper titled "bestDEG: a web-based application automatically combines various tools to precisely predict differentially expressed genes (DEGs) from RNA-Seq data" (https://peerj.com/articles/14344) show that DESeq2 achieves higher sensitivity than other tools when applied to newer human RNA-Seq datasets. This finding should be included in the manuscript. For example, DESeq2 was selected based on findings by Rapaport et al. (2013)⁴⁰, which demonstrated its superior specificity and sensitivity as well as good control of false positive errors. Additionally, recent findings from the bestDEG study (cite bestDEG) further support the higher sensitivity of DESeq2 than other tools when applied to newer human RNA-Seq datasets.
Page 6, line 124-125 "Raw reads quality control are then performed using 125 FASTQC18 and QC reports are compiled using MultiQC19." The quality of the trimmed reads can be assessed using FastQC, as demonstrated and summarized in the paper titled "VOE: automated analysis of variant epitopes of SARS-CoV-2 for the development of diagnostic tests or vaccines for COVID-19." (https://peerj.com/articles/17504/) (Page 4, in last paragraph ""(1) Per base sequence quality (median value of each base greater than 25), (2) per sequence quality (median quality greater than 27), (3) perbase N content (N base less than 5% at each read position) and (4) adapter content (adapter sequences at each position less than 5% of all reads)". This point should be mentioned in the manuscript, including the cutoff values for each FastQC metrics used in RNA-SeqEZPZ, as these thresholds may vary. For example, the quality of the trimmed FASTQ reads was assessed based on the four FastQC metrics, as summarized by Lee et al. (2024). The cutoffs for RNA-SeqEZPZ were set as follows: the median value of each base must be greater than [x], the median quality score must be above [y], the percentage of N bases at each read position must be less than [z]%, and the proportion of adapter sequences at each position must be below [xx]% of all reads.
The programs used for counts table creation and alignment process should be mentioned in the manuscript.
The default cutoffs for FDR and log₂ fold change, as well as instructions on how to modify these thresholds, should be clearly stated in the manuscript.
-
