Inferring protein from mRNA concentrations using convolutional neural networks
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (preLights)
Abstract
Transcript abundance is a widely used but poor predictor of protein abundance. As proteins are the actual agents executing biological functions, and because signaling outcome depends in a non-linear manner on the concentration of the network components, we aimed to develop a convolutional neural network-(CNN-) based predictor for Homo sapiens and the reference plant Arabidopsis thaliana . After hyperparameter optimization and initial analysis of the training data, we employed a distinct training module for value and sequence data, respectively, predicting 40% of the variance in protein levels in Homo sapiens , respectively 48% in Arabidopsis thaliana . Codon counts and peptides had the greatest predictive power. Extracting the learned weight revealed generally similar trends but also some intriguing differences between human and Arabidopsis. Many learned motifs in the 5’ and 3’ UTRs correspond to previously described regulatory features demonstrating that the model can learn ab initio mechanistically relevant features.
Article activity feed
-
Excerpt
Convolutional Gene Expression Detective: Decoding mRNA Features to Predict Protein Abundance
-