High-accuracy mapping of human and viral direct physical protein-protein interactions using the novel computational system AlphaFold-pairs

Abstract

Protein-protein interactions are central, highly flexible components of regulatory mechanisms in all living cells. Over the years, diverse methods have been developed to map protein-protein interactions. These methods have revealed the organization of protein complexes and networks in numerous cells and conditions. However, these methods are also time consuming, costly and sensitive to various experimental artifacts. To avoid these caveats, we have taken advantage of the AlphaFold-Multimer software, which succeeded in predicting the structure of many protein complexes. We designed a relatively simple algorithm based on assessing the physical proximity of a test protein with other AlphaFold structures. Using this method, named AlphaFold-pairs, we have successfully defined the probability of a protein-protein interaction forming. AlphaFold-pairs was validated using well-defined protein-protein interactions found in the literature and specialized databases. All pairwise interactions forming within the 12-subunit transcription machinery RNA Polymerase II, according to available structures, have been identified. Out of 66 possible interactions (excluding homodimers), 19 specific interactions have been found, and an additional previously unknown interaction has been unveiled. The SARS-CoV-2 surface glycoprotein Spike (or S) was confirmed to interact with high preference with the human ACE2 receptor when compared to other human receptors. Notably, two additional receptors, INSR and FLT4, were found to interact with S. For the first time, we have successfully identified protein-protein interactions that are likely to form within the reassortant Eurasian avian-like (EA) H1N1 swine G4 genotype Influenza A virus, which poses a potential zoonotic threat. Testing G4 proteins against human transcription factors and molecular chaperones (a total of 100 proteins) revealed strong specific interactions between the G4 HA and HSP90B1, the G4 NS and the PAQosome subunit RPAP3, as well as the G4 PA and the POLR2A subunit. We predict that AlphaFold-pairs will revolutionize the study of protein-protein interactions in a large number of healthy and diseased systems in the years to come.

High-accuracy mapping of human and viral direct physical protein-protein interactionsusing the novel computational system AlphaFold-pairs

Peer-Review, AlphaFold-Pairs This paper presents methods and results for comparing protein-protein interaction predictions. The AlphaFold-Pairs method is based on the popular AlphaFold-Multimer model, additionally providing a new scoring method and a strategy to screen against false positive interactions by comparing all proteins in a test set against one another. As with the original AlphaFold-Multimer model, the authors provide reasonable evidence that AlphaFold-Multimer is apparently capable of capturing valid protein-protein interactions with low false positives. The paper is very clearly written, and the authors provide a Nextflow implementation of their workflow that seems accessible and straightforward for new users.

AlphaFold-Pairs performs pairwise all-by-all protein-protein interaction prediction for an input list of proteins. Since most proteins do not specifically interact with one another, this approach often includes negative interactions that can be used for comparison against high-scoring true positive interactions. AlphaFold-Pairs uses an amino acid distance metric to score putative protein-protein interactions.

Overall, I would appreciate more thorough evaluation of the AlphaFold-Pairs approach. There are a number of aspects of the method that are worthy of further exploration, which I will detail below.

How do different scoring metrics perform? The AlphaFold-Multimer algorithm includes a scoring method for protein-protein interactions. A comparison between the distance-based score in this work would be greatly appreciated. Is a new scoring method necessary? If so, why?

How do scoring metrics compare with experimental data? Many protein-protein interactions have been experimentally tested and assigned binding affinities. Is there a correlation between in silico binding metrics and experimentally determined binding affinities?

What is the effect of the linker on the results? AlphaFold-Multimer is capable of folding multi-domain proteins. It can be used for PPIs when two proteins that may interact are combined into a single, synthetic polypeptide chain. After folding, the protein domains are analyzed to determine if any inter-domain interactions are biologically plausible and indicative of protein-protein binding. Because the two proteins are combined into a single chain, a long flexible linker is often included between the two domains. This linker may or may not be critical for the efficacy of AlphaFold-Multimer and AlphaFold-Pairs in identifying protein-protein interactions, and again an analysis of linker length and composition would be much appreciated.

What is the optimal way to prepare a set of bait and target proteins? In this work, a set of proteins that are thought to interact are tested in an all-by-all fashion. This raises the question of how to optimally prepare lists of bait and target proteins that can confidently identify true positive PPIs. I would appreciate an investigation into the target/bait protein lists. It is possible that including known protein-protein interactions as well as known negative interactions would provide valuable context that would help a user interpret the results of AlphaFold-Pairs. Currently there are no guidelines for users to prepare lists of bait/target proteins.

Read the original source

High-accuracy mapping of human and viral direct physical protein-protein interactions using the novel computational system AlphaFold-pairs

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

The Evolution of the AlphaFold Architecture

The Deep Core: Mapping the 0.91% Regulatory Backbone of the Human Proteome and Its Role in Cancer Drug Resistance

Chromosome-naïve global networks of initiators of hybrid assembly pathways of endogenous multiprotein complexes

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Evolution of the AlphaFold Architecture

The Deep Core: Mapping the 0.91% Regulatory Backbone of the Human Proteome and Its Role in Cancer Drug Resistance

Chromosome-naïve global networks of initiators of hybrid assembly pathways of endogenous multiprotein complexes