ExEnDiff: An Experiment-guided Diffusion model for protein conformational Ensemble generation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Understanding protein conformation is key to understanding their function. Importantly, most proteins adopt multiple conformations with non-trivial ensemble distributions that change depending on their environment to perform functions like catalysis, signaling, and transport. Recently, machine learning techniques, especially deep generative models, have been employed to develop protein conformation generators. These models, known as unified protein ensemble samplers, are trained on the PDB dataset and can generate diverse protein conformation ensembles given a protein sequence. However, their reliance solely on structural data from the PDB, which primarily captures folded protein states, restricts the diversity of the generated ensembles and can result in physically unrealistic conformations. In this paper, we overcome these challenges by introducing ExEnDiff, an experiment-guided diffusion model for protein conformation generation. ExEnDiff integrates experimental measurements as a physical prior, enabling the generation of protein conformations with desired properties. Our experiments on a variety of fast-folding and intrinsically disordered proteins demonstrate that ExEnDiff significantly advances the capabilities of current unified protein ensemble samplers. With little computational cost, ExEnDiff can capture important proteins' configuration properties and the underlying Boltzmann distribution, paving the way for a next-generation molecular dynamics engine. We further demonstrate the effectiveness of ExEnDiff to capture conformational changes in the presence of mutations and as an efficient tool for determining a reasonable CV space for protein ensembles. With these results, ExEnDiff is well-poised to push the study of protein ensembles into a data-rich regime currently available to few problems in biology.

Article activity feed