Accurate protein stability prediction for small domains using mega-scale experiments

Yehlin Cho
Kotaro Tsuboyama
Theodore J. Litberg
Michelle D. Jung
Adunoluwa Obisesan
Qian Wang
Claire M. Phoumyvong
Jane Thibeault
Sergey Ovchinnikov
Gabriel J. Rocklin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Predicting absolute protein folding stability is a long-standing challenge in biophysics, with broad applications in protein design and in understanding genetic variation and evolution. Physics-based simulations have shown limited success at predicting stability and are often computationally intractable, and machine learning methods have been constrained by the lack of sufficiently large experimental datasets. We recently introduced cDNA display proteolysis, a cell-free approach that can measure folding stability for nearly one million protein domains in parallel. Here, we applied this method to measure stability for 1.8 million diverse protein domains 60-80 amino acids in length primarily taken from the MGnify metagenomic database and spanning over 200,000 sequence families. Using this new “MGnify Stability dataset”, we developed the predictive models SaProtΔG and ESM3ΔG, which accurately predict absolute folding stability for small domains with root mean squared error of 0.8 kcal/mol over a 6 kcal/mol range (Spearman rank correlation of 0.88). These predictors show high accuracy at predicting effects of substitutions, insertions, and deletions, successfully identify global trends toward higher stability in thermophilic organisms, and improve discrimination of stable and unstable computationally designed proteins. Our results illustrate how megascale biophysical measurements can complement existing evolutionary and structural data to enable accurate absolute stability prediction for small domains.

Version published to 10.64898/2026.05.19.726285 on bioRxiv
May 20, 2026

Evolutionary constraints improve protein large language model predictions for protein stability, binding regions and epistasis

This article has 3 authors:
1. Konstantina Tzavella
2. Catharina Olsen
3. Wim Vranken
This article has no evaluationsLatest version May 26, 2026
Deep Learning Structural Ensembles as Proxies for Protein Flexibility

This article has 3 authors:
1. Mehmet Tahir Tunc
2. Ayten Dizkirici Tekpinar
3. Mustafa Tekpinar
This article has no evaluationsLatest version May 18, 2026
Just Add Structure: Protein Language Models Combined with Structural Equivariance Excel at Protein Tasks

This article has 5 authors:
1. Qurat-ul-ain
2. Carlos Outeiral
3. Matteo Cagiada
4. Yee Whye Teh
5. Charlotte M. Deane
This article has no evaluationsLatest version May 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evolutionary constraints improve protein large language model predictions for protein stability, binding regions and epistasis

Deep Learning Structural Ensembles as Proxies for Protein Flexibility

Just Add Structure: Protein Language Models Combined with Structural Equivariance Excel at Protein Tasks