Systematic comparison of Generative AI-Protein Models reveals fundamental differences between structural and sequence-based approaches.

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Recent advances in artificial intelligence have led to the development of generative models for de novo protein design. We compared 13 state-of-the-art generative protein models, assessing their ability to produce feasible, diverse, and novel protein monomers. Structural diffusion models generally create designs with higher confidence in predicted structures and more biologically plausible energy distributions, but exhibit limited diversity and strong sequence biases. Conversely, protein language models generate more diverse and novel designs but with lower structural confidence. We also evaluated these models' ability to generate unique proteins, conditionally based on the Tobacco Etch Virus (TEV) protease. Generative models were successful in producing functional enzymes, albeit with diminished activity compared to the wildtype TEV. Our systematic benchmarking provides a foundation for evaluating and selecting generative protein models, while highlighting the complementary strengths of different generative paradigms. This framework will facilitate an informed application of these tools for bio-medical engineering and design.

Article activity feed

  1. Monomers generated through structural diffusion appear to only occupy a small region in comparison to both the UniRef50 and PISCES sequences, whereas generative models of sequences appear to more evenly populate the space of similar length natural proteins.

    Taxonomic biases of the training data likely also play an important role here. The data sources aren't equal in how they've sampled the protein universe. This is especially apparent when comparing the structure and sequence databases. For example, certain taxa (e.g., humans) are overrepresented in the PDB, while others dominate UniRef.

    It's not hard to imagine how the distribution differences in the t-SNE might reflect this, especially given the strong overlap of sequence-based methods with the UniRef samples. Do you know if the same is true for the structure-based methods? If you visualized where, say, PDB proteins are, would there be strong overlap?

    Any ideas on how to disentangle approach from the taxonomic makeup of training data?

  2. The transmembrane domain protein fused tTA is localised to the plasma membrane, and thus the GFP signal is low in the absence of an active TEV protease, but an active protease cleaves tTA enabling its translocation to the nucleus and induction of GFP expression

    Do you have a method to control for or normalize for differences in the expression of your TEV proteases?

  3. Twenty-three of the 110 selected design monomers could not be successfully cloned, potentially as a result of the instability or toxicity of the synthesised sequence towards the host E.coli cells.

    Cloned or expressed? It would be interesting to know more about the ones that you weren't able to produce. Which methods were they from? How different were they from the starting protein? Did the cells die, or did you just get super low yield?

  4. To compare the catalytic activity, designed monomers were expressed in BHK21 cells together with a tetracycline inducible green fluorescent protein (GFP) and a synthetic protein consisting of tetracycline-controlled transactivator (tTA) tethered via a linker containing the TEV endogenous catalytic site (ENLYFQ’S) to a transmembrane domain protein. The transmembrane domain protein fused tTA is localised to the plasma membrane, and thus the GFP signal is low in the absence of an active TEV protease, but an active protease cleaves tTA enabling its translocation to the nucleus and induction of GFP expression (Figure 4A).

    Very cool paper! Really great to see a (rare) comparison between all these different methods. I’m very interested in the experimental readout, do you have any thoughts on how the in cell GFP assay might be influenced by factors like expression level, stability, or translational efficiency? Just curious if you think those could affect the comparisons at all.