gAIRR-wgs: An Algorithm to Genotype T Cell Receptor Alleles Using Whole Genome Sequencing Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
T cell receptor ( TR ) genes, including variable (TR_V), diversity (TR_D), and joining (TR_J) segments, exhibit allelic diversity that is critical to adaptive immunity. Growing evidence has identified associations between TR genes and immune-related diseases. Germline variants may influence TR gene function and subsequent usage, highlighting the importance of accurate TR allele profiling. However, accurately identifying germline TR from standard WGS data remains challenging due to short read lengths, limited depth, and high sequence similarity. To address these challenges, we developed gAIRR-wgs, for WGS-based TR allele typing. By incorporating novel alleles from HPRC individuals, gAIRR-wgs exhibited excellent performance in allele calling, with F1 scores of 100.0% for TR_D, 99.8% for TR_J, and 98.3% for TR_V. Applying this pipeline to 1,492 individuals from the Taiwan Biobank (TWB), we identified 449 novel TR alleles, 277 of which overlapped with HPRC release 1 data of mixed ethnicity and are absent in the IMGT database. Further population comparison analysis revealed significant TR allele distribution differences across global populations, showing population-specific patterns and diversity variations between ethnic groups. We also discovered TWB-specific deletion polymorphisms affecting contiguous TRGV and TRBV genes, which are not recorded in the gnomAD database and undetected by standard structural variant callers, highlighting the need for tailored approaches to resolve complex immune gene regions. In conclusion, gAIRR-wgs enables accurate TR allele calling from standard WGS data using feasible computational resources and reveals substantial immunogenetic diversity in population cohorts.