PARROT is a flexible recurrent neural network framework for analysis of large protein datasets

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The authors report a user-friendly software package (PARROT) that allows non-experts to use machine learning approaches to analyze high-throughput experiments on proteins. This package will allow more scientists to apply these powerful machine learning methods, thus increasing our ability to understand the chemistry and biological function of proteins.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.

Article activity feed

  1. Evaluation Summary:

    The authors report a user-friendly software package (PARROT) that allows non-experts to use machine learning approaches to analyze high-throughput experiments on proteins. This package will allow more scientists to apply these powerful machine learning methods, thus increasing our ability to understand the chemistry and biological function of proteins.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    This is a well-written, timely manuscript describing a robust and user-friendly machine learning method tailored for high-throughput studies of proteins (PARROT). The authors provide an accessible description of the method and several case studies demonstrating its utility. The software source code is readable and I was able to run their example analyses. The documentation is excellent and was easy to navigate. I think the tool will be useful for the many groups doing high-throughput experiments that are not machine learning experts.

  3. Reviewer #2 (Public Review):

    The analysis of large data sets obtained from omics or other approaches is often the most time consuming and difficult step of a study. Deep learning and related computational approaches offer the possibility to train a software on a certain data set and then analyze large new experimental data sets. The authors describe the software architecture and demonstrate the application of the system on three different topics: prediction of phosphorylation, prediction of transactivation potential of peptides and prediction of aggregation propensity. They compare the results of their new software PARROT with other existing software tools.

    Overall, the performance of the new software tool seems excellent. In particular its flexibility will make PARROT a very useful tool for the analysis of large data sets.