VolcaNoseR – a web app for creating, exploring, labeling and sharing volcano plots

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Comparative genome- and proteome-wide screens yield large amounts of data. To efficiently present such datasets and to simplify the identification of hits, the results are often presented in a type of scatterplot known as a volcano plot, which shows a measure of effect size versus a measure of significance. The data points with the largest effect size and a statistical significance beyond a user-defined threshold are considered as hits. Such hits are usually annotated in the plot by a label with their name. Volcano plots can represent ten thousands of data points, of which typically only a handful is annotated. The information of data that is not annotated is hardly or not accessible. To simplify access to the data and enable its re-use, we have developed an open source and online web tool with R/shiny. The web app is named VolcaNoseR and it can be used to create, explore, label and share volcano plots ( https://huygens.science.uva.nl/VolcaNoseR ). When the data is stored in an online data repository, the web app can retrieve that data together with user-defined settings to generate a customized, interactive volcano plot. Users can interact with the data, adjust the plot and share their modified plot together with the underlying data. Therefore, VolcaNoseR increases the transparency and re-use of large comparative genome- and proteome-wide datasets.

Article activity feed

  1. ###Reviewer #2:

    In this paper, the authors describe a web-app that can create, customized, and labeled volcano plots. Technically, from three columns of a CSV file (log fold change, log p-value, and gene name), it displays a scatter plot, with labeled dots. The app (made with shinyR) can be used online or run locally with R/Rstudio. In itself, the app is well done, easy to use, and reactive. Compared to similar existing tools (VolcanoR, Genavi, msVolcano), it is an improvement: it is more intuitive and more "interactive". All that said, it's still a single-use plotting tool, with limited applications, as it avoids doing any statistical analysis on the data.

    1. It's not possible to interact directly with the spreadsheet inside the web-app or to select a subset of it, or do simple arithmetic operations on the columns (replacing a log fold-change by a log2 for example).

    2. The x-axis cannot be put in log-scale.

    3. Being able to export the R code that generates such a plot would be a nice functionality, for those who want to be able to easily use the general look of the plot inside their own pipelines.

    4. It would be nice to be able to get q-value from p-values or to measure a false discovery rate.

  2. ###Reviewer #1:

    Goedhart and Luijsterburg developed a R-based web application VolcaNoseR for plotting a kind of scatter plot widely used in transcriptomics/proteomics research(significance vs log fold change), also known as a volcano plot. Using VolcaNoseR it is very easy to create nice-looking, annotated volcano plots, as the GUI provides control of most of the parameters of the plot, such as labels, the significance threshold, the colour schemes etc. Importantly, VolcaNoseR plots are also interactive, which can be used to explore the data and get easy access to any particular gene/protein.

    1. As the authors indicated in the very beginning of their paper, volcano plots are used for visualization of large amounts of data. Making scatter plots is possible with almost all existing plotting tools: from MS Excel to specialized packages in R (https://www.bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html ) and plotly (for interactive plots). The authors make the point that VolcanoseR is unlike all these softwares because it does not require the user to have any programming skills, since it has a custom-tailored GUI. However, producing and correctly interpreting the underlying big data already requires computational/coding skills that far exceed making a scatter plot (especially with many tutorials for the latter available online (https://huntsmancancerinstitute.github.io/hciR/volcano.html )).

    2. One of the main features of VolcaNoseR is the ability to make publication ready plots. Yet one will need many more visualisations for any manuscript, than volcano plots. And to do other visualisations (e.g. heatmaps, violin plots and others) potential users will still need to use other plotting tools (and even be proficient in it to match the style of other visualisations in the manuscript with the volcano plot produced by the VolcaNoseR web app).

    3. In the part data re-use authors provide a nice example of previously published data, where data points that were not annotated in the source study could be of special interest (Fig. 3). However, I doubt that investigating labels of hundreds of data points one by one on the interactive plot with the cursor is easier, than just filtering underlying source data tables for significant results and searching for genes of interest in the resulting table.

  3. ##Preprint Review

    This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

    ###Summary:

    VolcaNoseR is a plotting tool to make volcano plots from transcriptomics and proteomics datasets. It is easy to use, intuitive, interactive and reactive. However, VolcaNoseR does not provide any statistical analysis of the data, and it is therefore of limited applicability as it is likely that a user that is able to handle and analyse this type of data, is likely to be able to produce a scatter plot. The bigger hurdles with this type of analysis that could lead to new biological results are centered around the correct statistical analysis of the data, problems that the software does not address.