Signal Peptide Efficiency: from High-throughput Data to Prediction and Explanation
The passage of proteins across biological membranes via the general secretory (Sec) pathway is a universally conserved process with critical functions in cell physiology and important industrial applications. Proteins are directed into the Sec pathway by a signal peptide at their N-terminus. Estimating the impact of physicochemical signal peptide features on protein secretion levels has not been achieved so far, partially due to the extreme sequence variability of signal peptides. To elucidate relevant features of the signal peptide sequence that influence secretion efficiency, an evaluation of ~12,000 different designed signal peptides was performed using a novel miniaturized high-throughput assay. The results were used to train a machine learning model, and a post-hoc explanation of the model is provided. By describing each signal peptide with a selection of 156 physicochemical features, it is now possible to both quantify feature importance and predict the protein secretion levels directed by each signal peptide. Our analyses allow the detection and explanation of the relevant signal peptide features influencing the efficiency of protein secretion, generating a versatile tool for the in silico evaluation of signal peptides.