AlphaFold-SFA: accelerated sampling of cryptic pocket opening, protein-ligand binding and allostery by AlphaFold, slow feature analysis and metadynamics

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Sampling rare events in proteins is crucial for comprehending complex phenomena like cryptic pocket opening, where transient structural changes expose new binding sites. Understanding these rare events also sheds light on protein-ligand binding and allosteric communications, where distant site interactions influence protein function. Traditional unbiased molecular dynamics simulations often fail to sample such rare events, as the free energy barrier between metastable states is large relative to the thermal energy. This renders these events inaccessible on the timescales typically simulated by standard molecular dynamics, limiting our understanding of these critical processes. In this paper, we proposed a novel unsupervised learning approach termed as slow feature analysis (SFA) which aims to extract slowly varying features from high-dimensional temporal data. SFA trained on small unbiased molecular dynamics simulations launched from AlphaFold generated conformational ensembles manages to capture rare events governing cryptic pocket opening, protein-ligand binding, and allosteric communications in a kinase. Metadynamics simulations using SFA as collective variables manage to sample ‘deep’ cryptic pocket opening within a few hundreds of nanoseconds which was beyond the reach of microsecond long unbiased molecular dynamics simulations. SFA augmented metadynamics also managed to capture accelerated ligand binding/unbinding and provided novel insights into allosteric communication in receptor-interacting protein kinase 2 (RIPK2) which dictates protein-protein interaction. Taken together, our results show how SFA acts as a dimensionality reduction tool which bridges the gap between AlphaFold, molecular dynamics simulation and metadynamics in context of capturing rare events in biomolecules, extending the scope of structure-based drug discovery in the era of AlphaFold.

Article activity feed

  1. Consolidated peer review report (24 May 2024)

    GENERAL ASSESSMENT

    The preprint by Vats et al. (2023) introduces a methodology of applying slow feature analysis (SFA) to AlphaFold ensembles, with the goal of identifying collective variables for subsequent MD simulations.

    The study aims to leverage AlphaFold's predictive capabilities to enhance understanding of protein dynamics and rare events, such as cryptic pocket opening, protein-ligand binding/unbinding, and allosteric modulation. By integrating AlphaFold predictions with molecular dynamics (MD) simulations and slow feature analysis (SFA), the objective is to develop a comprehensive framework for efficiently sampling and analyzing these critical molecular events in ways that might not be sampled by classical simulations or using traditional collective variables alone.

    Key findings are that AlphaFold-generated structural ensembles provide useful initial conformations that capture essential conformational heterogeneity. The study demonstrates the utility of AlphaFold in seeding MD simulations to capture rare events, such as the flipping of key residues necessary for cryptic pocket opening in plasmepsin II. In the RIPK2 test case, conformational alterations in the activation loop and DFG moiety elucidate their roles in protein function and interactions relevant to inflammatory diseases. Integration of SFA with metadynamics allows for the efficient sampling of rare events within a shorter simulation time compared to traditional methods, thereby accelerating the exploration of protein dynamics. Generally, their approach works for sampling relevant conformational changes in both side chains and backbones for at least two test cases.

    Strengths of this work include the well written description of the SFA method, and demonstration of success in two distinct cases. Integrating AlphaFold with computational methods like SFA and metadynamics is in principle a powerful approach to studying protein dynamics and functional mechanisms, with potential applications in drug discovery and disease understanding. This study showcases the synergy between AI-based protein structure prediction and computational biology, facilitating more comprehensive and efficient exploration of protein dynamics and interactions.

    Weaknesses include a lack of clarity as to what input data are required to run the method, what criteria were used to measure success, and what specifically is learned by the application of SFA, as well as some missing figure captions and citations. A general concern about applicability is the use of vanilla AlphaFold predictions as starting points for molecular dynamics, for example given the tendency of the AlphaFold inference system to bias towards states with more contacts.

    RECOMMENDATIONS

    The manuscript in its current state is convincing in presenting the method, but could benefit from reorganization and streamlining to more directly expose the relevant results to the reader.

    Essential revisions:

    1. The manuscript is not clear in defining what data are required to run this method. The initial modeling is done with AlphaFold, which requires only an input sequence. However, the pipelines for the two main test cases are quite different, with two 40-ns simulations for each of the 80 AlphaFold models of plasmepsin-II, but ten 20-ns simulations for each of 32 AlphaFold models of RIPK2; no explanation is given for these different parameterizations. More importantly, for plasmepsin-II the metadynamics simulations were executed on PDB structures instead of AlphaFold models, implying that such structures are in fact necessary. It is not clear what the starting structure was used for metadynamics simulations of RIPK2. The authors should clearly state whether they believe experimental structures as metadynamics inputs are necessary for this method to work, as it is an important consideration for prospective users.
    2. Confidence in AlphaFold-generated models should be analyzed, or at least discussed. The underlying assumption regarding the presence of conformational diversity in the generated ensemble is speculative at best. The proposed method could be tested in protein systems with known conformational states as references to validate sampling; methods like AFcluster1 or SPEACH_AF2 could enhance diversity.
    3. There are relatively few details about what specifically is learned by the application of SFA. Although the details of the approach for the given systems are clearly described in Methods, there is little description of its applicability to other fields. In Results, figures S3A and S8 aim to explain the learned features for plasmepsin-II and RIPK2 respectively, but it is not clear what these features are, or what exactly is communicated. The x-axes are particularly confusing, as they seem to indicate a sequential index of features that do not correspond to amino acids (for example, the text refers to Phe165 in RIPK2, but the x-axes in S8 end around 150). Although machine-learning-derived features are often difficult to explain, it would help to clarify the x-axis titles, and add qualitative descriptions to the text and/or captions. There are also few examples for how metadynamics with SFA-picked CVs compares to traditional metadynamics with hand-picked CVs. Figures 8E/F, S12, and S13 compare how this method captures transitions of RIPK2 between the two states of interest, while unbiased simulations do not; but otherwise, the authors rely on prior publications to illustrate advantages of their approach in uncovering cryptic pockets.

    Optional suggestions:

    1. The tests being carried out to evaluate success should be clarified. In the case of plasmepsin-II, success was evaluated on the basis of chi-angle rotations of Trp41 and Tyr77; for RIPK2, the relevant residues were Phe165 and Trp170. However, these residues are only introduced (briefly) in Methods, then explained in somewhat more detail in Results. It would be helpful to add at least one or two sentences about these residues in the Introduction.
    2. The number of samples is an important factor in determining the extent of conformational diversity in the ensemble generated by AlphaFold. Optimizing this for downstream metadynamics-SFA should expedite convergence, as it is highly dependent on initial states.
    3. In the simulations field, it is common to use time-lagged component analysis (TICA) to describe slow modes of motion. Could the authors compare their method with this previously established approach?
    4. Before training SFA, the authors performed parallel MD simulations starting from the AlphaFold-generated seeds. It would be nice to see what conformational space is covered during the initial unbiased MD simulations, to see what information is gained from these relative to the static starting positions of the AlphaFold models. Are these simulations connected in the space that is used to train SFA? If not, how could this affect the analysis?
    5. The description of RIPK2 on p. 14, particularly its biological relevance, seems out of place in Results. Consider moving some of this content to Introduction and/or Discussion.

    REVIEWING TEAM

    Reviewed by:

    Diego del Alamo, Investigator, GSK, Switzerland: protein design, deep learning

    Nandan Haloi, Postdoctoral Fellow, KTH Royal Institute of Technology, Sweden: molecular dynamics simulations, enhanced sampling, Markov state modeling

    Yogesh Kalakoti, Postdoctoral Fellow, Linköping University, Sweden: computational biology, large language models, structural bioinformatics

    Curated by:

    Rebecca J. Howard, Senior Researcher, Stockholm University, Sweden

    (This consolidated report is a result of peer review conducted by Biophysics Colab on version 2 of this preprint. Comments concerning minor and presentational issues have been omitted for brevity.)