Assessing data size requirements for training generalizable sequence-based TCR specificity models via pan-allelic MHC-I non-self ligandome evaluation

Antoine Delaunay*
Miles McGibbon*
Bachir Djermani
Nikolai Gorbushin
Sergio Chaves García-Mascaraque
Isaac Rayment
Ilya Kizhvatov
Cécile Petit
Maren Lang
Karim Beguir
Uğur Sahin
Liviu Copoiu
Nicolas Lopez Carranza
Andrey Tovchigrechko

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Quickly identifying which T cell receptors (TCRs) specifically bind patient-unique neoepitopes is a critical challenge for personalized TCR cell therapy in oncology. Due to enormous diversity of both TCR and neoepitope repertoires, a machine learning predictor of TCR-pMHC specificity for personalized therapy must generalize to TCRs and epitopes not seen in the training data. For the first time, we estimate the necessary size of such training data. We first show that published models fail to generalize beyond a single-residue dissimilarity to the epitope training set distribution. We then impute the possible mutated ligandome across the 34 most prevalent human MHC alleles and represent it as a graph based on our established dissimilarity cutoff. By finding the dominating set of this graph, we estimate that between one and 100 million epitopes are required to train a generalizable sequence-based TCR specificity prediction model - 1000 times the size of current public data. *Antoine Delaunay & Miles McGibbon contributed equally to this work.

Version published to 10.21203/rs.3.rs-6446591/v1 on Research Square
Apr 16, 2025

Predicting specificity of TCR-pMHC interactions using machine learning and biophysical models

This article has 9 authors:
1. Martin Culka
2. Nicolas W Lounsbury
3. William Thrift
4. Santrupti Nerli
5. Andrew Wallace
6. Gergo Nikolenyi
7. Darya Orlova
8. Kiran Mukhyala
9. Mohammed AlQuraishi
This article has no evaluationsLatest version Apr 7, 2025
A functionally validated TCR-pMHC database for TCR specificity model development

This article has 20 authors:
1. Marius Messemaker
2. Bjørn P.Y. Kwee
3. Živa Moravec
4. Daniel Álvarez-Salmoral
5. Jos Urbanus
6. Sam de Paauw
7. Jeroen Geerligs
8. Rhianne Voogd
9. Ben Morris
10. Aurélie Guislain
11. Maike Mußmann
12. Yaël Winkler
13. Maxime Steinmetz
14. Matyas Iras
15. Eric Marcus
16. Jonas Teuwen
17. Anastassis Perrakis
18. Roderick L. Beijersbergen
19. Wouter Scheper
20. Ton N. Schumacher
This article has no evaluationsLatest version May 2, 2025
NetTCR-struc, a structure driven approach for prediction of TCR-pMHC interactions

This article has 2 authors:
1. Sebastian N Deleuran
2. Morten Nielsen
This article has no evaluationsLatest version Mar 25, 2025

Listed in

Abstract

Article activity feed

Related articles

Predicting specificity of TCR-pMHC interactions using machine learning and biophysical models

A functionally validated TCR-pMHC database for TCR specificity model development

NetTCR-struc, a structure driven approach for prediction of TCR-pMHC interactions