Exploring evolution to uncover insights into protein mutational stability

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Determining the impact of mutations on the thermodynamic stability of proteins is essential for a wide range of applications such as rational protein design and genetic variant interpretation. Since protein stability is a major driver of evolution, evolutionary data are often used to guide stability predictions. Many state-of-the-art stability predictors extract evolutionary information from multiple sequence alignments (MSA) of proteins homologous to a query protein, and leverage it to predict the effects of mutations on protein stability. To evaluate the power and the limitations of such methods, we used the massive amount of stability data recently obtained by deep mutational scanning to study how best to construct MSAs and optimally extract evolutionary information from them. We tested different evolutionary models and found that, unexpectedly, independent-site models achieve similar accuracy to more complex epistatic models. A detailed analysis of the latter models suggests that their inference often results in noisy couplings, which do not appear to add predictive power over the independent-site contribution, at least in the context of stability prediction. Interestingly, by combining any of the evolutionary features with a simple structural feature, the relative solvent accessibility of the mutated residue, we achieved similar prediction accuracy to supervised, machine learning-based, protein stability change predictors. Our results provide new insights into the relationship between protein evolution and stability, and show how evolutionary information can be exploited to improve the performance of mutational stability prediction.

Article activity feed

  1. Similar results were observed for epistatic models other than pycofitness.

    Given the strength of the claims you are making here, I would suggest reporting these results explicitly - as is the reader has no ability to evaluate this claim.

  2. JackHMMER

    While useful in assembling sequences for your MSA, HMM methods for MSA construction may no be, or are unlikely to be the most accurate. Given that you are interested in the impact of MSA construction on your predictions, it might be worth considering the more accurate methods typically used in phylogenetic inference.

  3. Finally, we used one feature based on the protein 3D structure, and defined a way of combining it with the evolutionary features described above:

    Is there a reason why you only considered the one feature based on protein 3D structure?

  4. (among which de novo designed pro-teins)

    I think this parenthetical phrase may be missing a word. I suggest "(including de novo designed proteins)" or "(some of which are de novo designed proteins)."

  5. We focused on single amino acid substitutions and computed the corresponding changes in folding free energy upon mutations, defined as ΔΔG = ΔGmt − ΔGwt, where wt and mt stand for “wild-type” and for “mutant” respectively.

    Have you considered evaluating how the use of a reference-free approach (as described here) might influence your findings here? By evaluating everything with respect to the wild-type, it can be difficult to disentangle local-epistatic effects from global nonlinearity in the sequence-function relationship.

    https://doi.org/10.1038/s41467-024-51895-5

  6. the latter being the primary driver of evolution [5, 6, 7]

    While certainly of importance, I would suggest this statement be tempered slightly - I'd suggest instead emphasizing that protein stability is one of the strongest forces constraining molecular evolution.