Heterogeneity of the GFP fitness landscape and data-driven protein design

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The manuscript dives into how protein structure/function robustness to mutation or polymorphism relates across evolutionary distance. The work indicates that evolutionarily related genes will have different shapes of robustness to variation, and that this will not necessarily track with phylogenetic relationships. The conclusions have potential ramifications for protein engineering, protein structure as well as population genetics and phylogenetics.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design – instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.

Article activity feed

  1. Evaluation Summary:

    The manuscript dives into how protein structure/function robustness to mutation or polymorphism relates across evolutionary distance. The work indicates that evolutionarily related genes will have different shapes of robustness to variation, and that this will not necessarily track with phylogenetic relationships. The conclusions have potential ramifications for protein engineering, protein structure as well as population genetics and phylogenetics.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  2. Reviewer #1 (Public Review):

    The authors aim to test how mutational robustness is maintained across a different variants of a single protein skeleton. The assumption has been that similarity in structure will equate a similarity in robustness to change.

    The work uses four proteins at different levels of robustness and utilized the unique properties of fluorescent proteins to assay the function of a massive mutant collection. This gives exquisite sampling immediately around each of the four proteins starting sequence but does leave the more distant spaces between the proteins unsampled.

    The partial connection of mutational robustness/fitness optima to environmental robustness is intriguing as it suggests that overly-optimized proteins will end up being less robust to environmental or mutational changes. The partial nature of this suggests that environmental and mutational robustness may be independently selectable parameters in protein design.

  3. Reviewer #2 (Public Review):

    This manuscript experimentally measured the effects of mutations for a large number of variants of four green fluorescent proteins (GFPs) and compared the topology of the protein fitness landscape between four GFPs. The authors have performed various biophysical experiments to obtain diverse thermo and kinetic stability data for GFPs to explain the difference in the mutational responses (robustness) between GFPs. Further, the authors fit the experimental data using various models to identify mutational epistasis in each GFP. Then the authors tested their models by synthesizing and characterizing genes that contain up to 48 mutations. Interestingly, the prediction was mostly successful using a highly fragile GFP template, as the experimental data exposed epistasis while such information was not extracted in highly robust templates. The authors suggest that their platform is useful for designing and predicting new functional proteins.

    This is an interesting study that has generated a massive amount of experimental data using deep mutational scanning using four GFP templates (with one data set from the authors' previous publication). The scale of the data is unprecedented. The neural network approach is novel. However, some of the overall findings, e.g., absence of correlation between sequence distance and mutational robustness, are not particularly new and surprising, and the authors overstate some of their findings. Some of the data description in the manuscript is unclear and can be improved.

  4. Reviewer #3 (Public Review):

    Somermeyer and coauthors performed a large-scale mutational analysis of four homologues of the green fluorescent protein (GFP). They show that two homologues were more resistant to the accumulation of mutations than the other two, and that this mutational robustness was related to a decreased number of negative epistatic interactions between mutations, rather than to reduced fitness effects of individual mutations. The authors then related mutational effects to the structures and biophysical properties of the proteins, finding the expected relationship between the effects of mutations on predicted protein stability and their effect on fluorescence. Finally, they used the data to train neural network models and design new GFP variants which retained near-wild-type function, despite differing from the original sequence by as many as 48 mutations.

    This is an excellent study. The manuscript is clearly written, the experimental and analytical methods are state-of-the-art, and the conclusions are convincing and have important implications in the areas of biotechnology and molecular evolution. The neural network approach for the prediction of mutational effects and for the design of new variants works surprisingly well, as judged by its ability to produce distant, but fully functional variants of GFP.