Simplifying the development of portable, scalable, and reproducible workflows

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper by Piccolo and collaborators provides a general introduction to the common workflow language (CWL) with working examples taken from bioinformatics. The authors also introduce the ToolJig web application intended as a way of interactively creating CWL documents. This work should be of interest not only to beginner bioinformaticians but also to more experienced professionals that do not routinely make use of the latest developments in reproducible research.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Command-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool’s inputs, outputs, and other execution details. CWL documents can include instructions for executing tools inside software containers. Accordingly, CWL tools are portable—they can be executed on diverse computers—including personal workstations, high-performance clusters, or the cloud. CWL also supports workflows, which describe dependencies among tools and using outputs from one tool as inputs to others. To date, CWL has been used primarily for batch processing of large datasets, especially in genomics. But it can also be used for analytical steps of a study. This article explains key concepts about CWL and software containers and provides examples for using CWL in biology research. CWL documents are text-based, so they can be created manually, without computer programming. However, ensuring that these documents conform to the CWL specification may prevent some users from adopting it. To address this gap, we created ToolJig, a Web application that enables researchers to create CWL documents interactively. ToolJig validates information provided by the user to ensure it is complete and valid. After creating a CWL tool or workflow, the user can create ‘input-object’ files, which store values for a particular invocation of a tool or workflow. In addition, ToolJig provides examples of how to execute the tool or workflow via a workflow engine. ToolJig and our examples are available at https://github.com/srp33/ToolJig .

Article activity feed

  1. Evaluation Summary:

    This paper by Piccolo and collaborators provides a general introduction to the common workflow language (CWL) with working examples taken from bioinformatics. The authors also introduce the ToolJig web application intended as a way of interactively creating CWL documents. This work should be of interest not only to beginner bioinformaticians but also to more experienced professionals that do not routinely make use of the latest developments in reproducible research.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    In this manuscript, Piccolo and collaborators provide a detailed overview of the Common Workflow Language (CWL) for beginner bioinformaticians, and perhaps even older ones that may not be that up-to-date with the latest developments in reproducible research. They also provide a webpage, ToolJig, to create CWL documents without needing to install any software nor learn the specifications of the format in much detail. Written in the form of a tutorial, its major strengths are that explanations are very clear, and are accompanied by illustrative figures and examples in their Github repository. I do not see any major weaknesses that need to be fixed. As science is currently undergoing a major reproducibility crisis, I think that it is crucial that detailed and accessible pieces such as this one are published to teach scientists to create fully reproducible code. If CWL is adopted widely then I believe that it may help alleviate these issues.

  3. Reviewer #2 (Public Review):

    The paper provides working examples of various bioinformatic pipelines written in CWL. The aim is to provide a general introduction by example to the CWL intended for researchers with limited programming skills. The authors also created the web application ToolJig intended to facilitate the generation of CWL in an interactive way.

    Strengths:

    - Working examples of useful bioinformatic pipelines written in CWL can provide a quick template for people to start on their own CWL projects.
    - Enfasis of Docker and containers as the building blocks of pipelines that are portable and reproducible.
    - ToolJig application as an interactive aid for unexperienced users for building their CWL documents.

    Weaknesses:

    Confusion regarding the usage of containers in regards to the location of the workflow manager ( production vs publication workflows). A container with all required analysis software, CWL document and a workflow manager seems well suited for distribution with a publication for reproducible calculations. However for production needs, a modular design with various containers and a workflow manager outside of the containers seem a better choice. It's hard to distinguish these two usages in the manuscript.