Cell-free protein synthesis as a method to rapidly screen machine learning-directed protease variants
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
Machine learning (ML) tools have revolutionised protein structure prediction, engineering and design, but the best ML tool is only as good as the training data it learns from. To obtain high quality structural or functional data, protein purification is typically required, which is both time and resource consuming – especially at the scale required to train ML tools. Here, we showcase cell-free protein synthesis (CFPS) as a straightforward and fast tool for screening and scoring the activity of protein variants for ML workflows. We demonstrate the utility of the system by improving the kinetic qualities of a protease. By rapidly screening just 48 random variants to initially sample the fitness landscape, followed by 32 more targeted variants, we identified several protease variants with improved kinetic properties.
Article activity feed
-
Overview of workflow.
I really like this figure!
-
We demonstrate CFPS’s role in the protein ML workflow by rapidly assessing 100’s of protease variants for functionality by combining CFPS with an assay for protease activity.
Really enjoyed reading this paper! It was nice to see your example of how one might combine CFPS and ML to evaluate variants!
-
homemade CFPS system
I'd love to know more about your homemade CFPS system. I'm assuming you used the Kwon & Jewett paper mentioned in the previous section but I'm not sure it's listed in your references.
-
In region A, none of the tested variants were more active than the original Con1.
Do you have confirmation that they're being expressed and folded? It would be interesting to see how well this does for stability instead of strictly enzyme activity.
-
As shown in Figure 6, comparable trends of activity are observed when the variants are screened either directly in CFPS, or as purified proteins at equal concentrations, therefore indicating that trends in variant activity in CFPS are reflecting genuine differences in protease activity.
Could you have just purified the protein straight from the CFPS to directly test the assumption that there's no significant difference in expression levels with CFPS assuming the same starting DNA concentration?
-
The mutations in regions A and B and their differing range of activities provide two contrasting sets of data to train the ML workflow on.
It's interesting that you get different results between regions A and B, and I agree that it seems like region A is involved in the enzymatic activity and has less flexibility, but I wonder if you combined mutations from region A + region B if you could get even more improved activity.
-