How good is generative diffusion model for enhanced sampling of protein conformations across scales and in all-atom resolution?

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Molecular dynamics (MD) simulations are fundamental for probing the structural dynamics of biomolecules, yet their efficiency is limited by the high computational cost of exploring long-timescale events. Generative machine learning (ML) models, particularly the Denoising Diffusion Probabilistic Model (DDPM), offer an emerging strategy to enhance conformational sampling. In this study, we evaluate the capabilities and limitations of DDPM in generating atomistically accurate conformational ensembles across proteins of varying size and structural order, ranging from the 20-residue folded Trp-cage and 58-residue BPTI to the 83-residue intrinsically disordered region Ash1 and the 140-residue intrinsically disordered protein α-Synuclein. Training DDPM on relatively short MD trajectories using both torsion angle and all-atom coordinate data, we demonstrate that it can reproduce key structural features such as secondary structure, radius of gyration, and contact maps, while effectively sampling sparsely populated regions of the conformational landscape. Notably, DDPM can also generate novel conformations, including transitions not explicitly observed in the training data. However, the model occasionally overlooks low-probability regions and may produce conformers with unclear physical relevance, warranting independent validation. These limitations are particularly evident in flexible systems such as IDPs. Overall, this work benchmarks DDPM as a viable tool for augmenting MD simulations, offering enhanced sampling with significant computational savings, while noting its limitations in capturing low-populated conformers. At the same time, it highlights the importance of rigorous validation and thoughtful interpretation when deploying generative models in computational biophysics.

Article activity feed