Using AlphaFold Multimer to discover interkingdom protein-protein interactions

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Structural prediction by artificial intelligence (AI) can be powerful new instruments to discover novel protein-protein interactions, but the community still grapples with the implementation, opportunities and limitations. Here, we discuss and re-analyse our in-silico screen for novel pathogen-secreted inhibitors of immune hydrolases to illustrate the power and limitations of structural predictions. We discuss strategies of curating sequences, including controls, and reusing sequence alignments and highlight important limitations originating from platforms, sequence depth and computing times. We hope these experiences will support similar interactomic screens by the research community.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/12167711.

    Preprint review of "Using AlphaFold Multimer to discover interkingdom protein-protein interactions"

    By Andrés Posbeyikian

    Summary

    This perspective by Felix Homma et al. describes both general aspects and details on how to apply AlphaFold Multimer (AFM) to the discovery of novel interkingdom protein-protein interactions. It's a brief and exciting report which is valuable to scientists with little to zero prior experience with AFM. It does a good job at introducing important modelling concepts and highlights important tips and tricks to optimize computing resources and modelling parameters.

    All in all, this reviewer believes the manuscript is well written, polished and complete, but there are some points that require clarification and suggestions which have been included in this review to enhance the clarity of the article. Nonetheless, the manuscript is already of high quality.

    As a general comment, and although this point may indeed be obvious to most readers, it would be good if the authors state the importance of downstream wet-lab validation of predicted complexes, and dedicate a line or two to putting protein modelling in general into context, as part of the bigger picture of a research project. See other comments below.

    Start with ColabFold online

    • This is a clear overview of the available options for running AFM, and how to start out with ColabFold online which is quite useful for beginners.

    • L35. "[…] e.g. 20 complexes." Of what average sequence length?

    • For prediction of protein complexes, a relevant parameter that ColabFold provides is the MSA mode, which can be set to 'unpaired_paired', 'unpaired', or 'paired'. If two input sequences are from phylogenetically distinct origins, then the MSA mode must not be set to paired. The paired mode constrains the sequence search space assuming that the two input sequences originated from the same species. This can have profound impacts in the depth and quality of one of the two modelled proteins, if they originate from different species. ColabFold is presented to the reader as a viable option for running AFM, so this parameter is a very relevant one to mention, given this work is centered around interkingdom complex predictions.

    Small sequences model faster

    • Figure 3. The figure shows a line of best fit in gray, but it's not mentioned in the figure legend. Could the authors comment on this trend line?

    Use a computing cluster for screens

    • L42. '[…] feed AFM with […]' —> provide AFM with.

    • L43. '[…] extensive experiences […]' —> extensive experience.

    • L47. '[…] computing clusters (e.g. access to over 70 GPUs in parallel) will enable to predict ~1,000 protein pairs.' As mentioned in the section titled 'Small sequences model faster', the length of a sequence has an impact on its modelling time. Because protein pairs can come in any size, it would be good to accompany this estimate of roughly 1000 protein pairs per day with a rough numerical estimate of the average sequence length this was calculated for.

    • L49. 'We noticed striking differences in ipTM+pTM and average plDDT scores between the online AFM implementation via ColabFold and AFM on the cluster.' These metrics (ipTM, pTM and average pLDDT) are mentioned for the first time in this section, but their meaning is only introduced 2 pages later in the section titled 'Evaluate the predicted scores'.  Perhaps the authors can either introduce them earlier by rearranging sections, or alternatively redirect the reader to the corresponding section for details.

    Use a computing cluster for screens

    Figure 2.

    It would be good if the authors informed a comparison of the configuration parameters for the runs on ColabFold online, and for the local version of ColabFold (number of recycles, whether AMBER side-chain relaxation was applied, the pairing mode, etc…). If the difference in confidence scores between installations is attributed to the difference in MSA depth (which in turn is attributed to the different sequence databases queried by each installation), could the authors measure the mean gapless MSA depth for each complex predicted through each configuration and compare them?  Calculating this shouldn't be a challenge, as a similar analysis has been done for the data shown in Figure 5.

    Try to get MSA>100

    L140. '[…] lower MSAs […] ' —> shallower MSAs.

    Evaluate the predicted scores

    • L158-159. "ColabFold summarises the MSA depth in a graphic presentation but does not provide the .pkl file." While this is true, ColabFold also provides a3m files which contain the MSAs which could be analysed to calculate mean non-gap MSA depth.

    • L155-157. "The plDDT confidence scores are documented in the .pdb files and can be visualised in PyMol with command [spectrum b, rainbow_rev, minimum=0, maximum=100]." While this is a convenient command for PyMol users, it should be noted that the color scale conventionally used for plotting pLDDT data has 4 bins:

    Very high confidence, dark blue: pLDDT> 90

    High confidence light blue: 90 > pLDDT > 70

    Low confidence, yellow 70> pLDDT > 50

    Very low confidence, orange pLDDT < 50

    The position of residues with an assigned pLDDT score of 50 or less should not be trusted, and can even be considered to belong to a turn, or an intrinsically disordered region. The PyMol command mentioned here effectively transforms the pLDDT color scale into a linear reverse rainbow scale with minimum at 0 and maximum at 100 (as can be seen in Figures 4 and 7) so values of pLDDTT = 50 are attributed a green hue, which, using the conventional AlphaFold scale would be attributed moderate-to-high confidence, when in fact the position should not be trusted. This may potentially confuse or mislead a reader who is used to the traditional scale. On the contrary, the ChimeraX 'color by pLDDT' command solves this issue.

    Use ipTM+pTM to select candidate interactions

    • The unweighted pTM+ipTM metric gives equal relevance to the overall individual chain prediction confidence (pTM) and to the confidence of the residues predicted at the interface between the two chains (ipTM). Could the authors comment on why they chose to analyse their data through the direct addition of these two scores (pTM + ipTM), instead of using the weighed arithmetic mean, which is described and employed by the AFM authors and others in the community (model confidence = 0.2 pTM + 0.8 ipTM). This metric gives much more importance to the confidence at the chain interface. How do the authors think their analysis shown in Figure 2 would be affected by plotting model confidence instead of pTM+ipTM?

    Beware of false negatives

    • L186-187. 'Interestingly, the AFM-predicted models also fail to adequately connect the disulphide bridges within Avr2, which have been determined experimentally (van 't Klooster et al., 2011)." What do the authors mean by 'connect'? As mentioned later in this review, AF2 will not show any covalent bonds between amino acid sidechains. Do the authors mean the placement of the thiol groups within disulphide-bridge range by 'adequate connection'?

    • L187. "Feeding AFM with additional Avr2 sequences will presumably […]" —> Providing/Running/Executing AFM with […]

     Explore hits manually

    • In Figure 1, subpanel 5, the authors very clearly illustrate the classification of models into 4 types of categories. The text in this section doesn't explicitly mention these 4 classes, although it describes their criteria. It could make for a neater explanation for the reader if this section was structured sequentially in accordance with these classes.

    • When the authors mention an "absence of disulphide bridges in Cys-rich SSPs", what do they mean exactly? AF2 models do not show covalent bonds between amino acid sidechains. This means that disulphide bridges must be assigned on the structure by post-processing the predicted model, based on spatial criteria such as proximity of the two thiol groups (with the disulphide bond length on average at 2.05 Å). A 2023 publication by Willems et al. (PMID: 36493985) describes the process of annotating cysteine disulphide bridges and metal binding sites in the plant kingdom through AlphaFold2 predicted structures. If a person with no prior knowledge on AlphaFold reads this work, and they are also working with structures that contain this feature, it may be relevant to them to see the disulphide bridge assignation criteria addressed.

    • L219. "We disregarded most complexes as inhibitor complexes as inhibitor candidates because they did not block the active site or lacked an intrinsic structure." What is defined as a 'lack of intrinsic structure'? Could the authors be more specific here? Were there no secondary structure elements at all? Were there some secondary structure elements but predicted with very low confidence?

    • Finally, the authors briefly mention they attempted to automate the classification task. Could the authors elaborate on why they think the classification task couldn't be automated (L221 - 224)? What additional criteria or intuition does the manual analysis bring, that escapes the automated analysis?

    • Determining whether the active site is present at the binding interface or not is probably already solving an important part of the automation of the analysis and helping analyse a full dataset; the authors could claim that they have 'partially' simplified and automated the classification of the predicted complexes, reducing the amount of models that an observer needs to look at before refining the classification between Classes 2 and 3.

    • Could this analysis be combined with input from secondary structure annotation tools, to check whether the ligand is adopting what the authors define as 'intrinsic structure' in the vicinity of the active site?

    Remaining challenges

    • L233. 'Weaker, temporal interactions involving  —> transient interactions

    AlphaFold 3

    • L245. 'However, we found that ipTM+pTM scores for complexes of SlP69B with CfEcp36, FoSix15 and FoTIL predicted by AlphaFold 3 are still below the 0.75 threshold, in contrast to our customised AFM workflow.' The AF3 program architecture is very different from the one of AF2. For this reason, there's an array of new metrics that AF3 provides as output, aside from the ipTM and pTM scores. For instance, there's a new score for clashes which is used as a penalization for the computation of the ranking score (ranking_score = 0.8 × ipTM + 0.2 × pTM + 0.5 × disorder − 100 × has_clash) (Abramson et al. 2024, PMID: 38718835). The ipTM+pTM metric may need to be updated, and likely is not a great indicator of confidence for AF3.

    • Another point the authors could address here is that there is currently no support for high-throughput screens using the AlphaFold3 server, not only because of the 20 job per day limit, but also because of the guidelines, terms and conditions imposed by Google. This scenario would be positively impacted by the open source sharing of the AlphaFold3 code, but unfortunately it is still unknown when exactly or if this will happen (DeepMind informally announced it might make the AlphaFold3 code and model weights available for academic use within six months).

    Competing interests

    The author declares that they have no competing interests.