Improved AlphaFold modeling with implicit experimental information

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Machine-learning prediction algorithms such as AlphaFold and RoseTTAFold can create remarkably accurate protein models, but these models usually have some regions that are predicted with low confidence or poor accuracy. We hypothesized that by implicitly including new experimental information such as a density map, a greater portion of a model could be predicted accurately, and that this might synergistically improve parts of the model that were not fully addressed by either machine learning or experiment alone. An iterative procedure was developed in which AlphaFold models are automatically rebuilt on the basis of experimental density maps and the rebuilt models are used as templates in new AlphaFold predictions. We show that including experimental information improves prediction beyond the improvement obtained with simple rebuilding guided by the experimental data. This procedure for AlphaFold modeling with density has been incorporated into an automated procedure for interpretation of crystallographic and electron cryo-microscopy maps.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/5841310.

    The authors seek out to improve Alphafold (AF) predictions with additional data in an iterative manner. In the specific case here, CryoEM maps. Alphafold predictions are still sub-par for all but approximately 36% of residues with a pLTTD measure below 90 (see [1] and references therein). a pLTTD measure above 90 is very strict but depending on what a model will be used for, it may be essential, for example to use the structural model as a target in docking simulations. Success rates are even more dismal when looking at whole polypeptide chain predictions. For example, it is not clear how many AF predictions have pLTTD > 90 for example for over 90% of the residues. Calling this measure h-index (inspired on the h-index of citations), It is also unknown what fraction of proteins with for example h-index 90 or above do not have a template that would allow for the successful homology modelling of the residues contributing to the high h-index. What is certain is that this research direction is absolutely essential as discussed by [2]. The same is true for homology modelling for example, namely, that added information can improve the quality of the results (the literature is too vast to mention any single article but a quick search will show several pertinent results).

    There are two points that I think could improve this manuscript are the following:

    1 - A table with the proteins used in the study, including the percentage of identity to the closest proteins used in AF training.

    2 - Expanding the analysis to a C-alpha displacements of at least 2A instead of only 3A - A lot of relevant interactions cannot be properly modelled with 3A displacements. Perhaps creating distributions for the accuracy as a function of C-alpha displacement.

     

    [1] Jones, D. T. & Thornton, J. M. The impact of AlphaFold2 one year on. Nat Methods 19, 15–20 (2022).

    [2] Subramaniam, S. & Kleywegt, G. J. A paradigm shift in structural biology. Nat Methods 19, 20–23 (2022).