Autoprot: Processing, Analysis and Visualization of Proteomics Data in Python

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

The increasing numbers of complex quantitative mass spectrometry-based proteomics data sets demand a standardised and reliable analysis pipeline. For this purpose, Python-based analysis, particularly through Jupyter notebooks, serves as a simple yet powerful tool. Nevertheless, the availability of Python software for standardised and accessible MS data analysis is limited, and this software is often constrained to using analysis functions written in Python. This excludes existing and well-tested software, for example written in R. Despite this, Python offers several interactive data visualisation modules that greatly enhance exploratory research and facilitate result communication with collaboration partners. Consequently, there is a need for an integrated and Jupyter-compatible Python analysis pipeline that incorporates R algorithms and interactive visualization for proteomics data analysis.

Summary

We developed autoprot, a Python module for simplified analysis of quantitative mass spectrometry-based proteomics experiments processed with the MaxQuant software. It provides access to established functions written in both Python and R for statistical testing and data transformation. Moreover, it generates JavaScript-based interactive plots that can be integrated into interactive web applications. Thereby, autoprot offers standardised, fast and reliable proteomics data analysis while maintaining the high customisability required to tailor the analysis pipeline to specific experiments.

Availability and Implementation

Autoprot is implemented in Python ≥ 3.9 and can be downloaded from https://github.com/ag-warscheid/autoprot . Online documentation is available at https://ag-warscheid.github.io/autoprot/ .

Article activity feed

  1. Thanks for the remark. You are correct, in its current form autoprot requires a local R installation for installing and running R code. Autoprot is used from within a Python installation but makes use of several well-established R packages for statistical analysis (through subprocess). This was a design choice to avoid re-implementing and (long-term) maintenance of a complete set of statistical packages which is beyond what we can do as an academic lab.

    I agree that installing a Python-only version from pip would be a nice option if the R-based statistics are not needed. We will look into implementing this.

    Regarding the dependencies, please see my reply to your second comment below.

  2. These tests are available in Python with autoprot working as a shim to hand down the arguments to R (via subprocess). This indeed complicates maintaining an identical environment across different system especially for different operating systems. As a first step, autoprot automatically saves lists of the Python and R environments (with version numbers) during a run which simplifies installing matching versions on a second system. In the long run, we aim to provide a containerized version of autoprot including a defined set of R packages.

  3. Advanced statistical R algorithms are invoked through a dedicated R installation

    From the figure below it looks like these statistical tests would be readily available or easy to write functions for in python? Just for installation and maintenance issues it's difficult to maintain code depending on two different languages.

  4. It provides access to established functions written in both Python and R for statistical testing and data transformation.

    From the way this preprint is written it sounds like this piece of software is a python package, but you need to have it installed to work with both Python and R? It would probably be best to either write the entire package in one language, or keep the python/R things separate so then you can install the package with pip or from CRAN/devtools depending on the language. Otherwise keeping up with dependencies down the line for both languages could be difficult.