Learning millisecond protein dynamics from what is missing in NMR spectra
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Many proteins' biological functions rely on interconversions between multiple conformations occurring at micro- to millisecond (μs-ms) timescales. A lack of standardized, large-scale experimental data has hindered obtaining a more predictive understanding of these motions. After curating >100 Nuclear Magnetic Resonance (NMR) relaxation datasets, we realized an observable for μs-ms dynamics might be hiding in plain sight. Millisecond dynamics can cause NMR signals to broaden beyond detection, leaving some residues not assigned in the chemical shift datasets of ~10,000 proteins deposited in the Biological Magnetic Resonance Data Bank (BMRB). We made the bold assumption that residues missing assignments are exchange-broadened due to μs-ms motions and trained various deep learning models to predict missing assignments. Strikingly, these models also predict exchange measured via NMR relaxation experiments, indicative of μs-ms dynamics. The best of these models, which we named Dyna-1, leverages an intermediate layer of the multimodal language model ESM-3. Notably, dynamics directly linked to biological function — including enzyme catalysis and ligand binding — are particularly well predicted by Dyna-1, which parallels our findings that residues experiencing μs-ms exchange are more conserved. We anticipate the datasets and models presented here will be transformative in unlocking the common language of dynamics and function.