Improved data-driven collective variables for biased sampling through iteration on biased data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Our ability to efficiently sample conformational transitions between two known states of a biomolecule using collective variable (CV) based sampling depends strongly on the choice of the CV. We previously reported a data-driven approach to clustering biomolecular configurations with a probabilistic clustering model termed ShapeGMM. ShapeGMM is a Gaussian Mixture Model in cartesian coordinates, with means and covariances in each cluster representing the harmonic approximation to the conformational ensemble around a metastable state. We subsequently showed that Linear Discriminant Analysis on positions (posLDA) is a good reaction coordinate to characterize the transition between two of these states, and moreover can be biased to produce transitions between the states using Metadynamics-like approaches. However, the quality of these LDA coordinates depends on the amount of data used to characterize the states, and here we demonstrate the ability to systematically improve them using enhanced sampling data. Specifically, we demonstrate that improved CVs for sampling can be generated by iteratively performing biased sampling along a posLDA coordinate and then generating a new shapeGMM model from biased data in the previous iteration. The new coordinates derived from our iterative approach show a substantial improvement in being able to induce transitions between metastable states, and to converge a free energy surface.

Article activity feed