Discovery of Expression-Governing Residues in Proteins

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Understanding how amino acids influence protein expression is crucial for advancements in biotechnology and synthetic biology. In this study, we introduce Venus-TIGER, a deep learning model designed to accurately identify amino acids critical for expression. By constructing a two-dimensional matrix that links model representations to experimental fitness, Venus-TIGER achieves improved predictive accuracy and enhanced extrapolation capability. We validated our approach on both public deep mutational scanning datasets and low-throughput experimental datasets, demonstrating notable performance compared to traditional methods. Venus-TIGER exhibits robust trans-ferability in zero-shot predicting scenarios and enhanced predictive performance in few-shot learning, even with limited experimental data. This capability is particularly valuable for protein design aimed at enhancing expression, where generating large datasets can be costly and time-consuming. Additionally, we conducted a statistical analysis to identify expression-associated features, such as sequence and structural preferences, distinguishing between those linked to high and low expression. Our investigation also revealed a correlation among stability, activity and expression, providing insight into their interconnected roles and underlying mechanisms.

Article activity feed

  1. Very interesting work! I’m curious about the effects of using training data from multiple expression systems (bacteria, fungi, mammalian cells), particularly since expression requirements can vary slightly between organisms. Have you explored whether expression system-specific models perform better when predicting expression within a given system? Or, is the training data biased toward one particular expression system, potentially leading to worse predictions for others? Or has the model really learned general features of expression across these organisms? Great work!