STAMPS: Signal-peptide Transformer for Augmenting Mammalian Protein Secretion

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

While secretion plays a key role in the diverse applications of cell engineering, to date only a handful of mammalian signal peptides have been characterised in depth and systematic efforts to build novel variants remain sporadic. We present STAMPS (Signal-peptide Transformer for Augmenting Mammalian Protein Secretion), a generative, autoregressive transformer fine-tuned on ∼6,000 mammalian signal peptides to design de novo sequences that modulate and enhance secretion across proteins and hosts. We show that STAMPS can be used to identify candidate signal peptides that outperform the widely used IgG κ light-chain leader (IgKL) when used to secrete EGFP in HEK293T and CHO cells, and hEPO in HEK293T cells. When incorporated in an industrial cell line development framework, STAMPS leads to the generation of signal peptides that yield ∼2.3-fold gain in secretion of a VHH-Fc compared to the internal industrial benchmark in CHO G22 cells, with the same candidates ranking highly in both CHO and HEK293T hosts. Sequence-to–function analysis highlights a longer, strongly hydrophobic core and a tightly positioned cleavage site as drivers of this strong performance. Together, these results establish data-driven, generative design of mammalian signal peptides as a practical route to tune and improve secretion for bioproduction and cell engineering applications.

Article activity feed

  1. Overall, this work demonstrates how the combination of Transformer-based generative models and oligo pool-based cloning offers a practical, fast and affordable route to optimise signal peptides for mammalian secretion and contributes to widening the toolbox of signal peptides available to cell engineering applications, also supporting the development of more efficient expression systems for biomanufacturing.

    Really cool work, I like the model + experimental validation all in one story!

  2. These were randomly generated and predicted to lead to secretion of EGFP by SignalP6.0

    Did you ever test any that weren't predicted to lead to secretion by SignalP? Just out of curiosity!

  3. Out of the total, 34 peptides led to secretion levels higher than Alb but lower than our high secretion control IgKL (∼38%), while 10 signal peptides led to secretion levels higher than IgKL (∼11%)

    Nice!

  4. Surprisingly, we also found that the untagged EGFP control was present in the culture supernatant at high levels for CHO-S cells.

    Is the untagged EGFP control the one labelled just "EGFP" in Figure 1C-D? Does that mean that the untagged EGFP did better than many of the tagged ones, including in the CHO-S some of the controls?

  5. The increase in intracellular EGFP in the presence of brefA was observed only in the tagged condition, confirming that IgKL acts as a peptide sequence inducing protein secretion

    I like this validation, and how it gives you a visual of what the endpoint is. It might be worth quantifying and explaining a bit more what you were hoping to gain from the brefA since you don't use it again. Also definitely not necessary, but I wonder if it also might have been a useful demonstration to include the Alb signal peptide to show the difference between it and the igKL peptide.