AF2χ: Predicting protein side-chain rotamer distributions with AlphaFold2

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The flexibility of protein side chains is an essential contributor of conformational entropy and affects processes such as folding, stability and molecular interactions. Structure determination experiments and prediction tools such as AlphaFold generally fail to capture or represent the conformational heterogeneity of proteins in solution. Experiments can be used to study side-chain flexibility, but cannot be applied at scale, and most prediction methods focus on reconstructing the minimum free energy state rather than an ensemble representing side-chain configurations. Here, we use AlphaFold2 and its internal side-chain representations to develop AF2χ that predicts side-chain χ-angle distributions and generates structural ensembles. We extensively benchmark AF2χ predictions using experimental NMR 3 J -couplings and s 2 order parameters, as well as dihedral angle distributions derived from collections of experimental structures, demonstrating the accuracy of AF2χ in generating accurate side-chain ensembles. We also compare the accuracy of AF2χ with molecular dynamics simulations and recent machine learning models aimed to generate conformational ensembles and show that AF2χ provides state-of-the-art accuracy orders of magnitude faster than molecular simulations. With its speed and accuracy, AF2χ offers a strong complementary option to simulations and rotamer library approaches, making it particularly valuable for applications such as protein design, ligand docking and interpretation of biophysical experiments.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17049067.

    Overview:

    This is an innovative manuscript that explores how the inner workings of the AlphaFold 2 deep learning algorithm can be exploited to predict protein side-chain dihedral-angle (torsion-angle) distributions and to subsequently construct atomistic ensemble-style models consistent with those distributions.  The authors have developed an approach called AF2chi to, on a per-residue and per-chi basis, "mix" a discrete chi prediction from traditional AF2 output with Top8000 distributions from many crystal structures of many proteins, then to reweight the resulting chi distribution using an average of discrete chi angle predictions from AF2's black-box-esque "inner layers".  The method then uses these distributions for most chi angles (see comments below about chi1-2 vs. chi3-4) to construct an atomistic ensemble of the full protein (100 distinct models) that reflects the reweighted chi distributions with some accompanying small coordinate shifts to accommodate them (see comments below about clashes and backbone shifts).

    The authors show that the chi angle distributions from their method are consistent with various experimental and computational points of reference including NMR 3J coupling data, NMR order parameters, "HSP" ensembles of closely related crystal structures, and MD simulations.  The comparisons span various protein systems depending on what types of data are available for each, as appropriate.  The method does not require an extensive multiple sequence alignment and can work equally well with only a single sequence for input.  Notably, AF2chi is orders of magnitude faster than comparatively quite expensive MD simulations, yet achieves similar results by these metrics.

    Together, these qualities make AF2chi an attractive solution to modeling side-chain conformational heterogeneity for any protein!  Still, we note a few caveats and areas for improvement below.

    Major Comments:

    Our biggest area of criticism relates to the generation of the structural ensemble, which is not the focus of most of the manuscript.  How are chi1-2 values "sampled" for this stage?  Is it random from the reweighted distributions, independently for each residue and for each chi (chi1 and chi2) for each residue -- or in some other, perhaps more principled way?  This question is key because the answer has much to say about what information the resulting ensemble contains about coupling between spatially adjacent residues.  It seems clear that the AF2 outer layer, and inner layers for that matter, contain significant information about how the chi angles of spatially adjacent residues are coupled (otherwise presumably the protein interior could not be packed in a satisfactory way by AF2!).  However, it is less clear how much of this valuable information remains in the final structural ensemble, after the AF2 sequence-specific information is mixed with the Top8000 sequence-independent information.  

    Relatedly, we were surprised to read the following:  "In principle, correlations between side-chain conformations could be captured in the structure generation step of AF2χ, as structures with clashes are rejected; in practice we find that the acceptance rate of structures is close to 100%. This suggests that any remaining steric clashes may be resolved when the backbone structure is relaxed, in line with previous observations that small backbone movements help decouple side-chain motions (Davis et al., 2006; DuBay et al., 2011)."   This raises a few questions.

    First, is it possible that there are actually significant remaining clashes, but the clash detection method employed here is too lenient?  This could be examined by varying the clash detection threshold and repeating certain analyses.  This could have the additional interesting advantage of helping to dissect degrees of local allosteric coupling based on which clashes are more vs. less easily resolved by relaxation.

    Second, is it possible that any initial clashes are indeed resolved, but at the expense of local geometry, which may become strained in terms of rotamericity (rotamer percentile), Ramachandran percentile, Cbeta deviations, bond length/angle deviations, etc.  This would depend on how "aggressive" the relaxation method is, given the tight imposed restraints on chi angles (per the Methods).  To address this concern, all of these geometry metrics should be compared using MolProbity for (a) residues that start with a clash pre-relaxation that is resolved by relaxation vs. (b) residues that start with no clashes pre-relaxation.

    Third, do initial clashes happen more/less with reweighting vs. without reweighting?  (We're not sure if structural ensembles were generated without reweighting, but this could be attempted.)  This could provide insight as to the extent to which the inner layer information about side-chain flexibility contributes to the clashes, independent of / controlling for other aspects of the AF2chi pipeline such as the routine for sampling chi angles for structural ensemble generation.

    Fourth, are specific backbone adjustments such as the backrub actually responsible for resolving initial clashes after relaxation?  The authors should check their stated hypothesis on this matter for the same sets of (a) vs. (b) residues as listed above.

    Minor Comments:

    Overall, it is a bit disappointing that AF2chi vs. AF2chi prior (with vs. without reweighting based on the AF2 inner layers) yield similar results in many of the presented analyses.  However, to their credit the authors openly acknowledge this, and point out that the reweighting does not typically do any harm in individual cases, even if it may not help much overall / in the aggregate.  So there is likely still some (perhaps small) value in this step.

    What is the sequence identity threshold for constructing HSP ensembles?  Is this an important parameter?  Do the results of the comparisons presented here depend on this threshold choice?  This may also vary based on the characteristics of the HSP ensemble in other respects such as distribution of resolution, crystal symmetries, etc.

    Bootstrap sampling is mentioned a few times, but its meaning in each such context could be better explained.

    Fig. 1g-j are confusing in a few regards.  We think there may be an error in the order of inner layer / outer layer for Fig. 1g,h vs. Fig. 1i,j.  This is made more confusing by the use of the terms "inner" and "outer" to describe the parts of the circular plots, which are not conceptually related to the inner and outer layers of the algorithm… 

    Fig. 3c: Why are there apparently 2 distinct clusters?

    "For residues that sample multiple χ1-angle free-energy wells in the HSP ensemble, we found that AF2χ provides better agreement with the HSP ensemble than CHARMM36m MD simulations, suggesting that AF2χ can accurately capture the structural heterogeneity of dynamic side chains."  They look pretty similar in the plot; is this statistically significant?

    What about chi3 and chi4?  Are they handled analogously to chi1-2 within AF2chi?  The Methods mention that in the ensemble generation step only chi1-2 are sampled, so how are chi3-4 handled there?  Are there analogous ground-truth standards to compare against for these dihedral angles farther from the backbone?  This is not touched upon in the manuscript, unless we missed it -- but is obviously important for defining full side-chain conformations, including the termini of longer side chains that engage in various important interactions in protein structures.

    What would be the equivalent of AF2chi for the protein backbone?  Are the backbone dihedral angles phi, psi, and omega handled analogously to side-chain chi angles within AF2, in which case e.g. Top8000 Ramachandran distributions could be used to construct priors per residue?  Or would prediction of backbone heterogeneity need to take a different form due to the inner workings of AF2?

    (This preprint review stems from a journal club discussion in the Keedy lab at the CUNY Advanced Science Research Center on August 29, 2025.)

    Competing interests

    The author declares that they have no competing interests.

  2. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/16746471.

    Summary

    This paper introduces a new method to predict the distribution of protein side chains from structures generated through AlphaFold2. While 'pseudoensembles' or high similarity sequence structures can provide the distribution of side chain conformations, the authors concluded that at least 20 structures were needed to establish a 'good enough' distribution, making this approach not uniformly available.

    Therefore, the authors turned to see if AlphaFold2 could predict the side chain distributions. Their method, AlphaFold2χ, leverages the inner layers of AlphaFold2 to predict the distributions of side chain conformations. They then use Bayesian/maximum entropy (BME) reweighting, of the inner AlphaFold2 predictions with existing rotamer libraries, to refine the predictions of side chain distributions.

    The authors tested their model's predictions in several ways. Side chain distributions were compared against nuclear magnetic resonance (NMR) J-couplings, molecular dynamics, S2-axis methyl NMR data, and lastly benchmarking against new generative artificial intelligence methods: BioEmu with Hpackaer, aSAMt, MDgen, and SeqDance.

    The author's main finding is that Bayesian reweighting monotonically improves the model's accuracy, measured by correlation with NMR data. Ultimately, the new model performs at least as well as rotamer libraries/current prediction methods and sometimes better than traditional molecular dynamics simulations when compared to side chain ensemble distributions determined by NMR. However, the details and comparisons within the paper are relatively light. While the authors are limited by well-studied systems, the lack of quantitative data and comparisons with a larger set of side chain conformational distributions limits the conclusion on how widely applicable the findings are and their impact.

    Major Revisions (please note, we used the biorxiv version to provide line numbers)

    • The claims of superior performance of Af2x require stronger quantitative backing. This is especially important when discussing difference between AF2x prior v. AF2x. In many plots these looks comparable. 201 through 207, lines 230 through 232, lines 272 through 273, lines 285 through 287, lines 298 through 310, lines 329 through 337, line 338 through 361, figures 3 through 6.

    • The null model is presented only for S2-axis methyl data; developing a null model across all evaluated models would be beneficial especially when comparing JS-divergence and J-Couplings, determining if AF2x is better than expectation. See lines 231 through 240, lines 286 through 291, figure 5, and supplemental table 4 

    • The authors only use globular proteins when testing sidechain ensemble distributions. While the level of validation via NMR and MR is commendable, ultimately testing on more types of proteins, like membrane proteins, is important for helping to classify the differences in sidechain dynamics across proteins. Rerunning all analyses for other types of proteins such as GPCRs should be performed. See lines 221 through 240 and lines 243 through 277

    • Qualitatively explaining the structural differences between the types of ensembles seen between HSP and AF2x will help understand the value the model provides. For example, do you see differences between buried or exposed side chains, amino acid type, or side chain that are sampling multiple rotamer wells in HSP. See lines 292 through 310

    • While we understand that the authors are limited in comparisons to good NMR metrics, given that there was good correlation between HSP with more than 20 members and NMR metrics, and that there are many more 20+ HSPs that could be created, we encourage the authors to do a larger comparison to determine how widely AF2x can pick up on the side chain heterogeneity. 

    • While the authors briefly comment on timescale in comparison with methyl order parameters, I think this point deserves a second line in the conclusions. While they are clear in stating that they are looking at side chain heterogeneity, specifying that each method looks at different but overlapping timescales will emphasize downstream use that this is for heterogeneity but not speaking about dynamics.

    Minor Revisions

    • We suggest citing Wankowicz et al, 2022 eLife instead of Wankowicz & Fraser, 2024 when talking about the impact of side chain heterogeneity with ligand binding. Lines 34

    • Please add citation: Vicinal Proton Coupling in Nuclear Magnetic Resonance, Karplus 1963 when referring to the Karplus equation in the HSP section. See line 107 through 108

    • In the model parameterization section, referring to Fig S3, please clarify that the decoy structure is AF2 in single sequence mode. See lines 173 through 181

    Competing interests

    The authors declare that they have no competing interests.