Structure-based prediction of T cell receptor:peptide-MHC interactions

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    The author customises an alpha-fold multimer neural network to predict TCR-pMHC and applies this to the problem of identifying peptides from a limited library, that might engage TCR with a known sequence from a limited list of potential peptides. This is an important structural problem and a useful step that can be further improved through better metrics, comparison to existing approaches, and consideration of the sensitivity of the recognition processes to small changes in structure.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The regulatory and effector functions of T cells are initiated by the binding of their cell-surface T cell receptor (TCR) to peptides presented by major histocompatibility complex (MHC) proteins on other cells. The specificity of TCR:peptide-MHC interactions, thus, underlies nearly all adaptive immune responses. Despite intense interest, generalizable predictive models of TCR:peptide-MHC specificity remain out of reach; two key barriers are the diversity of TCR recognition modes and the paucity of training data. Inspired by recent breakthroughs in protein structure prediction achieved by deep neural networks, we evaluated structural modeling as a potential avenue for prediction of TCR epitope specificity. We show that a specialized version of the neural network predictor AlphaFold can generate models of TCR:peptide-MHC interactions that can be used to discriminate correct from incorrect peptide epitopes with substantial accuracy. Although much work remains to be done for these predictions to have widespread practical utility, we are optimistic that deep learning-based structural modeling represents a path to generalizable prediction of TCR:peptide-MHC interaction specificity.

Article activity feed

  1. Author Response

    eLife assessment

    The author customises an alpha-fold multimer neural network to predict TCR-pMHC and applies this to the problem of identifying peptides from a limited library, that might engage TCR with a known sequence from a limited list of potential peptides. This is an important structural problem and a useful step that can be further improved through better metrics, comparison to existing approaches, and consideration of the sensitivity of the recognition processes to small changes in structure.

    I appreciate the time taken by the editor and reviewers to assess this manuscript. In response to their comments, I've made significant changes and additions to the manuscript, most importantly adding (1) comparisons to TCRpMHCmodels and sequence-similarity based template selection, (2) analysis of peptide modeling accuracy in structure prediction and epitope prediction, (3) analysis and discussion of bias in the ternary structure database, (4) identification of key factors driving structure prediction accuracy, (5) binding predictions for three experimental systems with altered peptide ligand data, and (6) additional discussion of the generalizability of the epitope specificity prediction results to systems without structural characterization.

    One minor correction to the wording of the above assessment: the alphafold network used as the basis of our protocol is the original "monomer" network, not the multimer network. We chose to start from the monomer network because it was not trained on complexes, allowing for a more accurate assessment of the expected performance when modeling unseen TCR:pMHC complexes. On the other hand, performance comparisons such as in Fig. 2 are made to the AlphaFold multimer pipeline, since that pipeline can directly build models of complexes.

    Reviewer #1 (Public Review):

    The author has generated a specific version of alpha-fold deep neural network-based protein folding prediction programme for TCR-pMHC docking. The alpha-fold multimer programme doesn't perform well for TCR-pMHC docking as the TCR uses random amino acids in the CDRs and the docking geometry is flexible. A version of the alpha-fold was developed that provides templates for TCR alpha-beta pairing and docking with class I pMHC. This enables structural predictions that can be used to rank TCR for docking with a set of peptides to identify the best peptide based on the quality of the structural prediction - with the best binders having the smallest residuals. This approach provides a step toward more general prediction and may immediately solve a class of practical problems in which one wants to determine what pMHC a given TCR recognizes from a limited set of possible peptides.

    Very minor point: the structure prediction pipeline (Fig. 2) handles both MHC class I and class II complexes. For epitope binding specificity prediction (Figs. 3-6), I only tested MHC class I targets due to limitations in data availability (very few class II epitopes have had their TCR repertoires mapped and also ternary complexes solved).

    Reviewer #2 (Public Review):

    The application of AlphaFold to the prediction of the peptide TCR recognition process is not without challenge; at heart, this is a multi-protein recognition event. While Alphafold does very well at modelling single protein chains its handling of multi-chain interactions such as those of antibody-antigens pairs have performed substantially lower than for other targets (Ghani et al. 2021). This has led to the development of specialised pipelines that tweak the prediction process to improve the prediction of such key biological interactions. Prediction of individual TCR:pMHC complexes shares many of the challenges apparent within antibody-antigen prediction but also has its own unique possibilities for error.

    One of the current limitations of AlphaFold Multimer is that it doesn't support multi-chain templating. As with antibodies, this is a major issue for the prediction of TCR:pMHC complexes as the nearest model for a given pMHC, TRAV, or TRBV sequence may be in entirely different files. Bradley's pipeline creates a diverse set of 12-hybrid AlphaFold templates to circumvent this limitation, this approach constrains inter-chain docking and therefore speeds predictions by removing the time-consuming MSA step of the AlphaFold pipeline. This adapted pipeline produces higher-quality models when benchmarked on 20 targets without a close homolog within the training data.

    The challenge to the work is of course not generating predictions but establishing a functional scoring system for the docked poses of the pMHC:TCR and most importantly clearly understanding/communicating when modelling has failed. Thus, importantly Bradley's pipeline shows a strong correlation between its predicted and observed model accuracy. To this end, Bradley uses a receiver operating characteristic curve to discriminate between a TCR's actual antigen and 9 test decoys. This is an interesting testing regime, which appears to function well for the 8 case studies reported. It certainly leaves me wanting to better understand the failure mode for the two outliers - have these correctly modelled the pMHC but failed to dock the TCRs for example or visa versa?

    From the analysis in Figure 5 and Figure 5, supplement 2, it looks to me like the pMHC is pretty well modeled in all cases, and the main difference between the working and non-working targets is in the docking of TCR to pMHC. But as the reviewer rightly points out below, binding specificity is likely sensitive to small details of the structure that may not be well captured by these RMSD metrics. With an N of 8, it's hard to make definitive conclusions. As additional systems with ternary structures and TCR repertoires become available, we should be able to provide better answers.

    The real test of the current work, or its future iteration, will be the ability to make predictions from large tetramer-sorted datasets which then couple with experimental testing. The pipeline's current iteration may have some utility here but future improvements will make for exciting changes to current experimental methods. Overall the work is a step towards applying structural understanding to the vast amount of next-generation TCR sequence data currently being produced and improves upon current AlphaFold capability.

    I completely agree. I am also excited about using this pipeline for design of TCR sequences with altered specificity and/or enhanced affinity. Even an imperfect in silico specificity prediction method can be a useful filter for designed TCRs (in other words, we want TCR designs that are predicted to have specificity for their intended targets). This has been amply demonstrated for protein fold design, where re-prediction of the structure from the designed sequence provides one of the most powerful quality metrics.

    Reviewer #3 (Public Review):

    This manuscript is well organized, and the author has generally shown good rigor in generating and presenting results. For instance, the author utilized TCRdist and structure-based metrics to remove redundancies and cluster complex structures. Additionally, the consideration of only recent structures (Fig. 2B) and structures that do not overlap with the finetuning dataset (Fig. 2D) is highly warranted.

    In some cases, it seems possible that there may be train/test overlap, including the binding specificity prediction section and results, where native complexes being studied in that section may be closely related to or matching with structures that were previously used by the author to fine-tune the AlphaFold model. This could possibly bias the structure prediction accuracy and should be addressed by the author.

    Other areas of the results and methods require some clarification, including the generation and composition of the hybrid templates, and the benchmark sets shown in some panels of Figure 2. Overall this is a very good manuscript with interesting results, and the author is encouraged to address the specific comments below related to the above concerns.

    1. In the Results section, the statement "visual inspection revealed that many of the predicted models had displaced peptides and/or TCR:pMHC docking modes that were outside the range observed in native proteins" only references Fig. S1. However, with the UMAP representation in that figure, it is difficult for readers to readily see the displaced peptides noted by the author; only two example models are shown in that figure, and neither seems to have displaced peptides. The author should provide more details to support this statement, specifically structures of example models/complexes where the peptide was displaced, and/or summary statistics noting (out of the 130 tested) how many exhibited displaced peptides and aberrant TCR binding modes.

    This is a good point, especially since what constitutes a "displaced peptide" is open to interpretation. I've added an analysis of peptide backbone RMSD (Fig. 2, supplement 2) that should make it possible for readers to assess this more quantitatively using an RMSD threshold (e.g. 10 Å) that makes sense to them.

    1. The template selection protocol described in Figure 1 and in the Results and Methods should be clarified further. It seems that the use of 12 docking geometries in addition to four individual templates for each TCR alpha, TCR beta, and peptide-MHC would lead to a large combinatorial amount of hybrid templates, yet only 12 hybrid templates are described in the text and depicted in Figure 1. It's not clear whether the individual chain templates are randomly assigned within the 12 docking geometries, as an exhaustive combination of individual chains and docking geometries does not seem possible within the 12 hybrid models.

    This was poorly explained; I hope I've clarified it now in the methods. The same four templates for each of the individual chains are used in each of the three AlphaFold runs, only the docking geometries vary between the runs. In other words, not all combinations of chain template and docking geometry are provided to AlphaFold.

    1. Neither the docking RMSD nor the CDR RMSD metrics used in Figure 2 will show whether the peptide is modeled in the MHC groove and in the correct register. This would be an important element to gauge whether the TCR-pMHC interface is correctly modeled, particularly in light of the author's note regarding peptide displacement out of the groove with AlphaFold-Multimer. The author should provide an assessment of the models for peptide RMSD (after MHC superposition), possibly as a scatterplot along with docking RMSD or CDR RMSD to view both the TCR and peptide modeling fidelity of individual models. Otherwise, or in addition, another metric of interface quality that would account for the peptide, such as interface RMSD or CAPRI docking accuracy, could be included.

    This is an excellent suggestion. The new Figure 2, supplement 2, addresses this.

    1. It is not clear what benchmark set is being considered in Fig. 2E and 2F; that should be noted in the figure legend and the Results text. If needed, the author should discuss possible overlap in training and test sets for those results, particularly if the analysis in Fig. 2E and 2F includes the fine-tuned model noted in Fig. 2D and the test set in Fig. 2E and 2F is not the set of murine TCR-pMHC complexes shown in Fig. 2D. Likewise, the set being considered in Fig. 2C (which may possibly be the same set as Fig. 2E and 2F) is not clear based on the figure legend and text.

    This has been fixed. More details below.

    1. The docking accuracy results reported in Fig. 2 do not seem to have a comparison with an existing TCR-pMHC modeling method, even though several of them are currently available. At least for the set of new cases shown in Fig. 2B, it would be helpful for readers to see RMSD results with an existing template-based method as a baseline, for instance, either ImmuneScape (https://sysimm.org/immune-scape/) or TCRpMHCmodels (https://services.healthtech.dtu.dk/service.php?TCRpMHCmodels-1.0; this only appears to model Class I complexes, so Class I-only cases could be considered here).

    This is a great suggestion. We've now added a comparison to TCRpMHCmodels (Fig. 2, supplement 3), which shows that the AlphaFold-based TCR pipeline significantly improves over that baseline method on MHC Class I complexes. Unfortunately, ImmuneScape is not available as a stand-alone software package, and the web interface doesn't allow customization of the template selection process to exclude closely-related templates, which is necessary for benchmarking. Given that ImmuneScape selects a single docking template based on sequence similarity, I compared the AF_TCR dock RMSDs to the dock RMSDs of the closest sequence template (excluding related complexes). This analysis (Fig. 2, supplement 3) shows that AlphaFold modeling produces significantly better docking geometries than simply taking the closest template by sequence similarity.

    1. As noted in the text, the epitopes noted in Table 1 for the specificity prediction are present in existing structures, and most of those are human epitopes that may have been represented in the AF_TCR finetuning dataset. Were there any controls put in place to prevent the finetuning set from including complexes that are redundant with the TCRs and epitopes being used in the docking-based and specificity predictions if the AF_TCR finetuned model was used in those predictions? For instance, the GILGFVFTL epitope has many known TCR-pMHC structures and the TCRs and TCR-pMHC interfaces are known to have common structural and sequence motifs in those structures. Is it possible that the finetuning dataset included such a complex in its training, which could have influenced the success in Figure 3? The docking RMSD accuracy results in Fig. 5A, where certain epitopes seem to have very accuracy docking RMSDs and may have representative complex structures in the AF_TCR finetuning set, may be impacted by this train/test overlap. If so, the author should consider using an altered finetuned model with no train/test overlap for the binding specificity prediction section and results, or else remove the epitopes and TCRs that would be redundant with the complex structures present in the finetuning set.

    This is an excellent point. It wasn't at all clear in the original submission, but the AlphaFold model that was fine-tuned on TCR complexes was only used for the mouse comparison in Fig. 2D (now Fig. 2F), and for exactly the reasons you mention. There is too much overlap between the epitopes with well-characterized repertoires and the epitopes with solved structures. This is also the reason we used the original AlphaFold monomer network, which was only trained on individual protein chains, rather than the AlphaFold multimer network, as the basis of the AF_TCR pipeline. As noted in the discussion, there is still the possibility that individual TCR chain structures in the benchmark or specificity prediction sets were part of the AlphaFold monomer training set, which could make the docking and specificity prediction results look better than they should (though not in Fig. 2B).

    1. The alanine scanning results (Figure 6) do not seem to be validated against any experimental data, so it's not possible to gauge their accuracy. For peptide-MHC targets where there is a clear signal of disruption, it seems to correspond to prominently exposed side chains on the peptide which could likely be detected by a more simplistic structural analysis of the peptide-MHC itself. Thus the utility of the described approach in real-world scenarios (e.g. to detect viral escape mutants) is not clear. It would be helpful if the author can show results for a viral epitope variant (e.g. from one of the influenza epitopes, or the HCV epitope, in Table 1) that is known to disrupt binding for single or multiple TCRs, if such an example is available from the literature.

    This is another great point. For me, the main motivation for the alanine scanning results was to further "stress test" the pipeline to see if it produced plausible results. A particular worry was that the use of pMHC:TCR confidence scores might allow the results to be skewed by peptide-MHC binding strength, rather than the intended TCR - pMHC interaction strength. We've seen in other work that the AlphaFold confidence scores for the peptide are correlated with peptide-MHC affinity. In the AF_TCR specificity predictions, we use the mean binding scores for the "irrelevant" background TCRs to subtract out peptide-intrinsic effects. The fact that we don't see strong signal in Figure 6 at the peptide anchor positions suggests that this is working, at least to some extent. It is also encouraging that the native peptide-MHC has stronger predicted binding than the majority of the alanine variants (excepting the two epitopes with poor performance).

    I agree that comparing the repertoire-level mutation sensitivity predictions to real-world experimental data is challenging, given uncertainty about which TCR clones drive selection for escape, and other viral fitness pressures that influence the escape process. The fact that some of the positions predicted to be most sensitive are also the sites of escape mutations (examples now given in the text) is encouraging. But the new peptide-variant results (Fig. 6, supplement 1) highlight the challenges that remain in discriminating between very similar peptides (especially in the single-TCR setting).

  2. eLife assessment

    The author customises an alpha-fold multimer neural network to predict TCR-pMHC and applies this to the problem of identifying peptides from a limited library, that might engage TCR with a known sequence from a limited list of potential peptides. This is an important structural problem and a useful step that can be further improved through better metrics, comparison to existing approaches, and consideration of the sensitivity of the recognition processes to small changes in structure.

  3. Reviewer #1 (Public Review):

    The author has generated a specific version of alpha-fold deep neural network-based protein folding prediction programme for TCR-pMHC docking. The alpha-fold multimer programme doesn't perform well for TCR-pMHC docking as the TCR uses random amino acids in the CDRs and the docking geometry is flexible. A version of the alpha-fold was developed that provides templates for TCR alpha-beta pairing and docking with class I pMHC. This enables structural predictions that can be used to rank TCR for docking with a set of peptides to identify the best peptide based on the quality of the structural prediction - with the best binders having the smallest residuals. This approach provides a step toward more general prediction and may immediately solve a class of practical problems in which one wants to determine what pMHC a given TCR recognizes from a limited set of possible peptides.

  4. Reviewer #2 (Public Review):

    The application of AlphaFold to the prediction of the peptide TCR recognition process is not without challenge; at heart, this is a multi-protein recognition event. While Alphafold does very well at modelling single protein chains its handling of multi-chain interactions such as those of antibody-antigens pairs have performed substantially lower than for other targets (Ghani et al. 2021). This has led to the development of specialised pipelines that tweak the prediction process to improve the prediction of such key biological interactions. Prediction of individual TCR:pMHC complexes shares many of the challenges apparent within antibody-antigen prediction but also has its own unique possibilities for error.

    One of the current limitations of AlphaFold Multimer is that it doesn't support multi-chain templating. As with antibodies, this is a major issue for the prediction of TCR:pMHC complexes as the nearest model for a given pMHC, TRAV, or TRBV sequence may be in entirely different files. Bradley's pipeline creates a diverse set of 12-hybrid AlphaFold templates to circumvent this limitation, this approach constrains inter-chain docking and therefore speeds predictions by removing the time-consuming MSA step of the AlphaFold pipeline. This adapted pipeline produces higher-quality models when benchmarked on 20 targets without a close homolog within the training data.

    The challenge to the work is of course not generating predictions but establishing a functional scoring system for the docked poses of the pMHC:TCR and most importantly clearly understanding/communicating when modelling has failed. Thus, importantly Bradley's pipeline shows a strong correlation between its predicted and observed model accuracy. To this end, Bradley uses a receiver operating characteristic curve to discriminate between a TCR's actual antigen and 9 test decoys. This is an interesting testing regime, which appears to function well for the 8 case studies reported. It certainly leaves me wanting to better understand the failure mode for the two outliers - have these correctly modelled the pMHC but failed to dock the TCRs for example or visa versa?

    The real test of the current work, or its future iteration, will be the ability to make predictions from large tetramer-sorted datasets which then couple with experimental testing. The pipeline's current iteration may have some utility here but future improvements will make for exciting changes to current experimental methods. Overall the work is a step towards applying structural understanding to the vast amount of next-generation TCR sequence data currently being produced and improves upon current AlphaFold capability.

  5. Reviewer #3 (Public Review):

    This manuscript is well organized, and the author has generally shown good rigor in generating and presenting results. For instance, the author utilized TCRdist and structure-based metrics to remove redundancies and cluster complex structures. Additionally, the consideration of only recent structures (Fig. 2B) and structures that do not overlap with the finetuning dataset (Fig. 2D) is highly warranted.

    In some cases, it seems possible that there may be train/test overlap, including the binding specificity prediction section and results, where native complexes being studied in that section may be closely related to or matching with structures that were previously used by the author to fine-tune the AlphaFold model. This could possibly bias the structure prediction accuracy and should be addressed by the author.

    Other areas of the results and methods require some clarification, including the generation and composition of the hybrid templates, and the benchmark sets shown in some panels of Figure 2. Overall this is a very good manuscript with interesting results, and the author is encouraged to address the specific comments below related to the above concerns.

    1. In the Results section, the statement "visual inspection revealed that many of the predicted models had displaced peptides and/or TCR:pMHC docking modes that were outside the range observed in native proteins" only references Fig. S1. However, with the UMAP representation in that figure, it is difficult for readers to readily see the displaced peptides noted by the author; only two example models are shown in that figure, and neither seems to have displaced peptides. The author should provide more details to support this statement, specifically structures of example models/complexes where the peptide was displaced, and/or summary statistics noting (out of the 130 tested) how many exhibited displaced peptides and aberrant TCR binding modes.

    2. The template selection protocol described in Figure 1 and in the Results and Methods should be clarified further. It seems that the use of 12 docking geometries in addition to four individual templates for each TCR alpha, TCR beta, and peptide-MHC would lead to a large combinatorial amount of hybrid templates, yet only 12 hybrid templates are described in the text and depicted in Figure 1. It's not clear whether the individual chain templates are randomly assigned within the 12 docking geometries, as an exhaustive combination of individual chains and docking geometries does not seem possible within the 12 hybrid models.

    3. Neither the docking RMSD nor the CDR RMSD metrics used in Figure 2 will show whether the peptide is modeled in the MHC groove and in the correct register. This would be an important element to gauge whether the TCR-pMHC interface is correctly modeled, particularly in light of the author's note regarding peptide displacement out of the groove with AlphaFold-Multimer. The author should provide an assessment of the models for peptide RMSD (after MHC superposition), possibly as a scatterplot along with docking RMSD or CDR RMSD to view both the TCR and peptide modeling fidelity of individual models. Otherwise, or in addition, another metric of interface quality that would account for the peptide, such as interface RMSD or CAPRI docking accuracy, could be included.

    4. It is not clear what benchmark set is being considered in Fig. 2E and 2F; that should be noted in the figure legend and the Results text. If needed, the author should discuss possible overlap in training and test sets for those results, particularly if the analysis in Fig. 2E and 2F includes the fine-tuned model noted in Fig. 2D and the test set in Fig. 2E and 2F is not the set of murine TCR-pMHC complexes shown in Fig. 2D. Likewise, the set being considered in Fig. 2C (which may possibly be the same set as Fig. 2E and 2F) is not clear based on the figure legend and text.

    5. The docking accuracy results reported in Fig. 2 do not seem to have a comparison with an existing TCR-pMHC modeling method, even though several of them are currently available. At least for the set of new cases shown in Fig. 2B, it would be helpful for readers to see RMSD results with an existing template-based method as a baseline, for instance, either ImmuneScape (https://sysimm.org/immune-scape/) or TCRpMHCmodels (https://services.healthtech.dtu.dk/service.php?TCRpMHCmodels-1.0; this only appears to model Class I complexes, so Class I-only cases could be considered here).

    6. As noted in the text, the epitopes noted in Table 1 for the specificity prediction are present in existing structures, and most of those are human epitopes that may have been represented in the AF_TCR finetuning dataset. Were there any controls put in place to prevent the finetuning set from including complexes that are redundant with the TCRs and epitopes being used in the docking-based and specificity predictions if the AF_TCR finetuned model was used in those predictions? For instance, the GILGFVFTL epitope has many known TCR-pMHC structures and the TCRs and TCR-pMHC interfaces are known to have common structural and sequence motifs in those structures. Is it possible that the finetuning dataset included such a complex in its training, which could have influenced the success in Figure 3? The docking RMSD accuracy results in Fig. 5A, where certain epitopes seem to have very accuracy docking RMSDs and may have representative complex structures in the AF_TCR finetuning set, may be impacted by this train/test overlap. If so, the author should consider using an altered finetuned model with no train/test overlap for the binding specificity prediction section and results, or else remove the epitopes and TCRs that would be redundant with the complex structures present in the finetuning set.

    7. The alanine scanning results (Figure 6) do not seem to be validated against any experimental data, so it's not possible to gauge their accuracy. For peptide-MHC targets where there is a clear signal of disruption, it seems to correspond to prominently exposed side chains on the peptide which could likely be detected by a more simplistic structural analysis of the peptide-MHC itself. Thus the utility of the described approach in real-world scenarios (e.g. to detect viral escape mutants) is not clear. It would be helpful if the author can show results for a viral epitope variant (e.g. from one of the influenza epitopes, or the HCV epitope, in Table 1) that is known to disrupt binding for single or multiple TCRs, if such an example is available from the literature.