Deep Learning Structural Ensembles as Proxies for Protein Flexibility
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein dynamics are essential to biological function, yet understanding whether deep learning models contain information about these dynamics remains an open question. In this study, we quantitatively investigate the capacity of deep learning structure generation methods to predict protein flexibilities by directly comparing residue-level mean squared fluctuation (MSF) profiles derived from structural ensembles with experimental or simulation-informed flexibility profiles. We assembled four diverse benchmark datasets representing different types of structural information, including 70 NMR ensembles, 43 X-ray crystallographic protein pairs in two distinct conformational states, 82 high-resolution cryo-EM structures, and molecular dynamics simulations of 10 proteins. Utilizing AlphaFold3, AlphaFold2, and RosettaFold to generate multiple structural models, we applied ranksort normalization to place the profiles on a comparable scale and quantified similarity primarily using cosine and Pearson similarities. Our results demonstrate that the flexibility predictions from deep learning-generated models agree well with experimental data, suggesting that fluctuations in these predicted ensembles can serve as effective proxies for protein flexibility. Notably, AlphaFold3 consistently produced the best results across the datasets. We also observed that flexibility prediction accuracy generally improves as the number of models increases up to 15, and our findings remain robust even when terminal residues are excluded from the analysis. To facilitate broader application, we provide three publicly accessible Jupyter Notebooks to calculate MSF from deep learning outputs. Ultimately, this work provides evidence that deep learning structural ensembles can serve as proxies for protein flexibility.