Prioritizing Stability-enhancing Mutations using a Protein Language Model in conjunction with Physics-Based Predictions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mutational protein engineering, as currently practiced in the biotechnology and pharmaceutical industries, is both tedious and expensive. Computationally driven protein design has the potential to expedite the process and generate high-quality variants at a lower cost. Using datasets of 174824 mutations for 180 proteins, we benchmark the effectiveness of a protein language model (PLM), Evolutionary Scale Modeling (ESM), alongside a physics-based method (PBM), namely molecular mechanics energies with generalized Born and surface area continuum solvation (MM/GBSA), as triaging tools for identifying and prioritizing target positions and specific mutations that improve protein thermodynamic stability. We found prediction biases in each method but also determined that these biases can be mitigated by applying the two methods in a complementary manner. We propose a hybrid mutation prioritization and selection strategy that achieves better accuracy than either method alone. Through re-ranking, the combined prioritization strategy attained a higher overall average ROC AUC of 0.744 across the dataset compared to either MM/GBSA alone (0.684) or ESM Log Odds alone (0.597).

Article activity feed