HyperMPNN – A general strategy to design thermostable proteins learned from hyperthermophiles

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Stability is a key factor to enable the use of recombinant proteins in therapeutic or biotechnological applications. Deep learning protein design approaches like ProteinMPNN have shown strong performance both in creating novel proteins or stabilizing existing ones. However, it is unlikely that the stability of the designs will significantly exceed that of the natural proteins in the training set, which are biophysically only marginally stable. Therefore, we collected predicted protein structures from hyperthermophiles, which differ substantially in their amino acid composition from mesophiles. Notably, ProteinMPNN fails to recover their unique amino acid composition. Here we show that a retrained network on predicted proteins from hyperthermophiles, termed HyperMPNN, not only recovers this unique amino acid composition but can also be applied to proteins from non-hyperthermophiles. Using this novel approach on a protein nanoparticle with a melting temperature of 65°C resulted in designs remaining stable at 95°C. In conclusion, we created a new way to design highly thermostable proteins through self-supervised learning on data from hyperthermophiles.

Article activity feed

  1. whereas the HyperMPNN construct exhibited a lower level of soluble yield in contrast to the parent and ProteinMPNN sequence (I53-50B.HMPNN: 0.4 mg/L culture; I53-50B: 20.4 mg/L culture; I53-50B.PMPNN: 25.8 mg/L culture)

    It's amazing that you made such a thermostable protein this way, but the decreased yield seems like it could be a limitation. Not in this study necessarily, but in the future, it would be interesting to know if proteins created this way do tend to have a lower level of soluble yield?

  2. owever, we saw no substantial difference in the number of salt bridges between proteins from E. coli (median 16.2) or hyperthermophiles (median 17.0). Intriguingly, for redesigns of E. coli proteins the ProteinMPNN designed sequences only had a median of 8.8 salt bridges per protein compared to the 17.0 median of HyperMPNN (Fig. 4B).

    This is interesting! Do you have any thoughts about this? Maybe ProteinMPNN has a bias towards less salt bridges in general?

  3. mesophile E. coli

    I'm curious how the protein content between the two organisms compare? Do they have many of the same proteins just optimized for high temperature vs not?

  4. It could be observed that the core of proteins from hyperthermophiles has 4.4% more apolar residues than the E. coli reference. For the surface, proteins from hyperthermophiles had a 3.9% increase in positively charged residues, a 4.1% increase in apolar residues, a 4.6% reduction in polar residues and a 4.6% reduction in others. For the core, an 4.4% increase in apolar residues in proteins from hyperthermophiles was observed.

    It's a bit hard to put these percentages into context. I wonder if having a graph that shows the actual values for each group instead of just the difference would be helpful. Like I'm curious how much variation there is on a protein-to-protein basis and how significant these differences are in relation to that.