PAbFold: Linear Antibody Epitope Prediction using AlphaFold2

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    In this manuscript, the authors describe a new AlphaFold2 pipeline called PabFold that can represent a useful tool for identifying linear antibody epitopes (B-cell epitopes) for different antigens. This information can be used in the selection of different reagents in competitive ELISA assays which can save time and reduce costs. Several questions, however, remain and the study is currently incomplete.

This article has been Reviewed by the following groups

Read the full article

Abstract

Defining the binding epitopes of antibodies is essential for understanding how they bind to their antigens and perform their molecular functions. However, while determining linear epitopes of monoclonal antibodies can be accomplished utilizing well-established empirical procedures, these approaches are generally labor-and time-intensive and costly. To take advantage of the recent advances in protein structure prediction algorithms available to the scientific community, we developed a calculation pipeline based on the localColabFold implementation of AlphaFold2 that can predict linear antibody epitopes by predicting the structure of the complex between antibody heavy and light chains and target peptide sequences derived from antigens. We found that this AlphaFold2 pipeline, which we call PAbFold, was able to accurately flag known epitope sequences for several well-known antibody targets (HA / Myc) when the target sequence was broken into small overlapping linear peptides and antibody complementarity determining regions (CDRs) were grafted onto several different antibody framework regions in the single-chain antibody fragment (scFv) format. To determine if this pipeline was able to identify the epitope of a novel antibody with no structural information publicly available, we determined the epitope of a novel anti-SARS-CoV-2 nucleocapsid targeted antibody using our method and then experimentally validated our computational results using peptide competition ELISA assays. These results indicate that the AlphaFold2-based PAbFold pipeline we developed is capable of accurately identifying linear antibody epitopes in a short time using just antibody and target protein sequences. This emergent capability of the method is sensitive to methodological details such as peptide length, AlphaFold2 neural network versions, and multiple-sequence alignment database. PAbFold is available at https://github.com/jbderoo/PAbFold.

Article activity feed

  1. eLife assessment

    In this manuscript, the authors describe a new AlphaFold2 pipeline called PabFold that can represent a useful tool for identifying linear antibody epitopes (B-cell epitopes) for different antigens. This information can be used in the selection of different reagents in competitive ELISA assays which can save time and reduce costs. Several questions, however, remain and the study is currently incomplete.

  2. Reviewer #1 (Public Review):

    Summary:

    In this manuscript, "PAbFold: Linear Antibody Epitope Prediction using AlphaFold2", the authors generate a python wrapper for the screening of antibody-peptide interactions using AlphaFold, and test the performance of AlphaFold on 3 antibody-peptide complexes. In line with previous observations regarding the ability of AlphaFold to predict antibody structures and antigen binding, the results are mixed. While the authors are able to use AlphaFold to identify and experimentally validate a previously characterized broad binding epitope with impressive precision, they are unable to consistently identify the proper binding registers for their control [Myc-tag, HA-tag] peptides. Further, it appears that the reproducibility and generality of these results are low, with new versions of AlphaFold negatively impacting the predictive power. However, if this reproducibility issue is solved, and the test set is greatly increased, this manuscript could contribute strongly towards our ability to predict antibody-antigen interactions.

    Strengths:

    Due to the high significance, but difficulty, of the prediction of antibody-antigen interactions, any attempts to break down these predictions into more tractable problems should be applauded. The authors' approach of focusing on linear epitopes (peptides) is clever, reducing some of the complexities inherent to antibody binding. Further, the ability of AlphaFold to narrow down a previously broadly identified experimental epitope is impressive. The subsequent experimental validation of this more precisely identified epitope makes for a nice data point in the assessment of AlphaFold's ability to predict antibody-antigen interactions.

    Weaknesses:

    Without a larger set of test antibody-peptide interactions, it is unclear whether or not AlphaFold can precisely identify the binding register of a given antibody to a given peptide antigen. Even within the small test set of 3 antibody-peptide complexes, performance is variable and depends upon the scFv scaffold used for unclear reasons. Lastly, the apparent poor reproducibility is concerning, and it is not clear why the results should rely so strongly on which multi-sequence alignment (MSA) version is used, when neither the antibody CDR loops nor the peptide are likely to strongly rely on these MSAs for contact prediction.

    Major Point-by-Point Comments:

    (1) The central concern for this manuscript is the apparent lack of reproducibility. The way the authors discuss the issue (lines 523-554) it sounds as though they are unable to reproduce their initial results (which are reported in the main text), even when previous versions of AlphaFold2 are used. If this is the case, it does not seem that AlphaFold can be a reliable tool for predicting antibody-peptide interactions.

    (2) Aside from the fundamental issue of reproducibility, the number of validating tests is insufficient to assess the ability of AlphaFold to predict antibody-peptide interactions. Given the authors' use of AlphaFold to identify antibody binding to a linear epitope within a whole protein (in the mBG17:SARS-Cov-2 nucleocapsid protein interaction), they should expand their test set well beyond Myc- and HA-tags using antibody-antigen interactions from existing large structural databases.

    (3) As discussed in lines 358-361, the authors are unsure if their primary control tests (antibody binding to Myc-tag and HA-tag) are included in the training data. Lines 324-330 suggest that even if the peptides are not included in the AlphaFold training data because they contain fewer than 10 amino acids, the antibody structures may very well be included, with an obvious "void" that would be best filled by a peptide. The authors must confirm that their tests are not included in the AlphaFold training data, or re-run the analysis with these templates removed.

    (4) The ability of AlphaFold to refine the linear epitope of antibody mBG17 is quite impressive and robust to the reproducibility issues the authors have run into. However, Figure 4 seems to suggest that the target epitope adopts an alpha-helical structure. This may be why the score is so high and the prediction is so robust. It would be very useful to see along with the pLDDT by residue plots a structure prediction by residue plot. This would help to see if the high confidence pLDDT is coming more from confidence in the docking of the peptide or confidence in the structure of the peptide.

    (5) Related to the above comment, pLDDT is insufficient as a metric for assessing antibody-antigen interactions. There is a chance (as is nicely shown in Figure S3C) that AlphaFold can be confident and wrong. Here we see two orange-yellow dots (fairly high confidence) that place the peptide COM far from the true binding region. While running the recommended larger validation above, the authors should also include a peptide RMSD or COM distance metric, to show that the peptide identity is confident, and the peptide placement is roughly correct. These predictions are not nearly as valuable if AlphaFold is getting the right answer for the wrong reasons (i.e. high pLDDT but peptide binding to a non-CDR loop region). Eventual users of the software will likely want to make point mutations or perturb the binding regions identified by the structural predictions (as the authors do in Figure 4).

  3. Reviewer #2 (Public Review):

    Summary:

    The authors showed the applicability and usefulness of a new AlphaFold2 pipeline called PabFold, which can predict linear antibody epitopes (B-cell epitopes) that can be helpful for the selection of reagents to be applied in competitive ELISA assay.

    Strengths:

    The authors showed the accuracy of the pipeline to identify correctly the binding epitope for three different antibody-antigen systems (Myc, HA, and Sars-Cov2 nucleocapsid protein). The design of scFvs from Fab of the three antibodies to speed up the analysis time is extremely interesting.

    Weaknesses:

    The article justifies correctly the findings and no great weaknesses are present. However, it could be useful for a broader audience to show in detail how pLDDT was calculated for both Simple-Max approach (per residue-pLDDT) and Consensus analysis ( average pLDDT for each peptide), with associated equations.