Predicting specificity of TCR-pMHC interactions using machine learning and biophysical models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding the mechanism of T-cell activation and T-cell receptor (TCR) discrimination of MHC-presented epitope peptides (pMHCs) remains an open problem. Machine learning (ML)-based prediction of TCR specificity has gained considerable recent attention. However, the capacity of current models to generalize to peptides unseen during training is currently unknown. Here, we use a proprietary cancer-patient data set that profiles TCR binding to novel regions of peptide space to show that peptide generalization remains an unsolved problem. Specifically, we show that while ML methods have demonstrable utility in predicting TCR specificity for known peptides, they fail to generalize to novel peptides. We also show that physics-based methods utilizing classical energy functions outperform ML methods when predicting TCR binding to novel peptides but underperform them on known peptides. In light of these observations, we develop a new ML model that leverages general knowledge acquired by protein foundation models to achieve better or comparable performance than either ML or biophysical methods on both in- and out-of-distribution TCR-pMHC specificity prediction. We furthermore analyze model performance as a function of distance of TCR sequence specificity between the training and test sets to quantitatively characterize the generalization potential of any given TCR-pMHC model. Our analysis sheds light on the status of modeling TCR-pMHC interactions and suggests new paths forward for continued method development and data acquisition.