Evaluation of Machine Learning-Assisted Directed Evolution Across Diverse Combinatorial Landscapes
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
Various machine learning-assisted directed evolution (MLDE) strategies have been shown to identify high-fitness protein variants more efficiently than typical wet-lab directed evolution approaches. However, limited understanding of the factors influencing MLDE performance across diverse proteins has hindered optimal strategy selection for wet-lab campaigns. To address this, we systematically analyzed multiple MLDE strategies, including active learning and focused training using six distinct zero-shot predictors, across 16 diverse protein fitness landscapes. By quantifying landscape navigability with six attributes, we found that MLDE offers a greater advantage on landscapes which are more challenging for directed evolution, especially when focused training is combined with active learning. Despite varying levels of advantage across landscapes, focused training with zero-shot predictors leveraging distinct evolutionary, structural, and stability knowledge sources consistently outperforms random sampling for both binding interactions and enzyme activities. Our findings provide practical guidelines for selecting MLDE strategies for protein engineering.
Article activity feed
-
Percentactive
I would be interested to see how this factor affects the extent to which MLDE can improve over DE. It seems to me the closer to 100% active or to 0% active would limit the training data such that the model may not be able to differentiate as clearly as it would with awareness of both active and inactive variants.
-
Landscape and functional attributes affect ZS predictability
This section was really interesting. I wonder where other aspects of function (stability, regulability/tunability) might fall in terms of ZS predictability. I'd be curious to see which aspects are most (and least) predictable with and without focused training.
-
Figure 5. Decision tree summarizing recommended ML strategies based on total number of variants screenedexperimentally, landscape navigability (e.g. active variant percentage, pairwise epistasis), the quality of ZSactive/inactive variant classification (i.e. ROC-AUC > 0.5), and the number of available screening rounds (N)
Love the format of a decision tree as a final summary! Looking forward to seeing it expand as more nuances are explored in this space.
-
depth of multiple sequence alignments (MSAs)
I would suggest describing/defining here exactly what you mean by MSA depth - does this involve collection/use of out-of-sample natural sequences to construct the MSA? Is depth here the number of aligned sequences outside of those you are predicting fitness for?
-
The ESM score
I would suggest clarifying earlier in the results section that across all models, the feature encodings used were ESM embeddings - as is this doesn't clearly come across without first going into the methods.
-
including the best Hamming distance ensemble, averaged across 12 landscapes. Shading indicates standard deviation.
I presume this is in reference to "Hamming distance EVmutation," plotted as the brown dotted line? This does quite well, but it's unclear from the figure with this actually is/how it incorporates the two features to make its predictions, since it's not described/indicated elsewhere in the figure.
-
Considering the variability in throughput and expense of experimental screens, we explored a range of total number of variants screened (total sample size), from 120 to 2,016 samples (Figure 2a; Table S2).
You haven't yet mentioned anything about how what ML model or what model architecture you are actually using to make fitness predictions - these details may be in the methods, but I would strongly suggest a section/paragraph briefly describing this to improve clarity!
-
local optimum
Is there a reason why you've chosen to use "local optimum" instead of the more commonly used term of "fitness peak" in the fitness landscape literature?
-
We reasoned that the number of KDE peaks, reflecting the distribution modalities of fitness, could serve as a proxy for the underlying landscape navigability, which impacts the outcome of DE.
I like this idea - but it seems like it would be quite valuable for you to explicitly assess the extent to which this is true. For instance, does the number of KDE peaks correlate with landscape ruggedness? Or the number of fitness peaks, or fitness sinks?
I might suggest adding in some of these basic exploratory tests investigating the relationships between the fitness statistics and fitness landscape measures to your work, either in the main text or supplement.
-
ten three-or four-site landscapes of the thermostable β-subunit of tryptophan synthase (TrpB)
Is it actually appropriate to consider these as being fundamentally distinct from each other? It seems like it would be more apt to describe them as different, but (in some cases) partially overlapping regions of the same activity landscape.
-