Discovery of Expression-Governing Residues in Proteins
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding how amino acids influence protein expression is crucial for advancements in biotechnology and synthetic biology. In this study, we introduce Venus-TIGER, a deep learning model designed to accurately identify amino acids critical for expression. By constructing a two-dimensional matrix that links model representations to experimental fitness, Venus-TIGER achieves improved predictive accuracy and enhanced extrapolation capability. We validated our approach on both public deep mutational scanning datasets and low-throughput experimental datasets, demonstrating notable performance compared to traditional methods. Venus-TIGER exhibits robust trans-ferability in zero-shot predicting scenarios and enhanced predictive performance in few-shot learning, even with limited experimental data. This capability is particularly valuable for protein design aimed at enhancing expression, where generating large datasets can be costly and time-consuming. Additionally, we conducted a statistical analysis to identify expression-associated features, such as sequence and structural preferences, distinguishing between those linked to high and low expression. Our investigation also revealed a correlation among stability, activity and expression, providing insight into their interconnected roles and underlying mechanisms.