TCR2HLA: calibrated inference of HLA genotypes from TCR repertoires enables identification of immunologically relevant metaclonotypes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

T cell receptors (TCRs) recognize peptides presented by polymorphic human leukocyte antigen (HLA) molecules, but HLA genotype data are often missing from TCR repertoire sequencing studies. To address this, we developed TCR2HLA, an open-source tool that infers HLA genotypes from TCRβ repertoires. Expanding on work linking public TRBV-CDR3 sequences to HLA genotypes, we incorporated “quasi-public” metaclonotypes – composed of rarer TCRβ sequences with shared amino acid features – enriched by HLA genotypes. Using four TCRβseq datasets from 3,150 individuals, we applied TRBV gene partitioning and locality-sensitive hashing to identify ∼96,000 TCRβ features strongly associated with specific HLA alleles from 71M input TCRs. Binary HLA classifiers built with these features achieved high balanced accuracy (>0.9) across common HLA-A (9/12), B (9/12), C (6/13), DRB1 (11/11) alleles and prevalent DPA1/DPB1 (6/10), DQA1/DQB1 (8/17) heterodimers. We also introduced a high-sensitivity calibration to support predictions in samples with as few as 5,000 unique clonotypes. Calibrated predictions with confidence filtering improved reliability. Beyond genotype imputation, TCR2HLA enables the discovery of novel HLA- and exposure-associated TCRs, as shown by the identification of SARS-CoV-2 related TCRs in a large COVID-19 dataset lacking HLA data. TCR2HLA provides a scalable framework for bridging the gap between TCRseq data and HLA genotype for biomarker discovery.

Article activity feed