Machine-Learning-Assisted Analysis of TCR Profiling Data Unveils Cross-Reactivity between SARS-CoV-2 and a Wide Spectrum of Pathogens and Other Diseases
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
During the last two years, the emergence of SARS-CoV-2 has led to millions of deaths worldwide, with a devastating socio-economic impact on a global scale. The scientific community’s focus has recently shifted towards the association of the T cell immunological repertoire with COVID-19 progression and severity, by utilising T cell receptor sequencing (TCR-Seq) assays. The Multiplexed Identification of T cell Receptor Antigen (MIRA) dataset, which is a subset of the immunoACCESS study, provides thousands of TCRs that can specifically recognise SARS-CoV-2 epitopes. Our study proposes a novel Machine Learning (ML)-assisted approach for analysing TCR-Seq data from the antigens’ point of view, with the ability to unveil key antigens that can accurately distinguish between MIRA COVID-19-convalescent and healthy individuals based on differences in the triggered immune response. Some SARS-CoV-2 antigens were found to exhibit equal levels of recognition by MIRA TCRs in both convalescent and healthy cohorts, leading to the assumption of putative cross-reactivity between SARS-CoV-2 and other infectious agents. This hypothesis was tested by combining MIRA with other public TCR profiling repositories that host assays and sequencing data concerning a plethora of pathogens. Our study provides evidence regarding putative cross-reactivity between SARS-CoV-2 and a wide spectrum of pathogens and diseases, with M. tuberculosis and Influenza virus exhibiting the highest levels of cross-reactivity. These results can potentially shift the emphasis of immunological studies towards an increased application of TCR profiling assays that have the potential to uncover key mechanisms of cell-mediated immune response against pathogens and diseases.
Article activity feed
-
-
SciScore for 10.1101/2022.05.10.22274905: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Data catalogues derived from three public TCR databases with immunogenetic information were downloaded to investigate the MIRA dataset’s TCR involvement in immune response during other infections. MIRAsuggested: (MIRA, RRID:SCR_010731)All protein sequences were downloaded from UniProt87 and the cross-reactivity exploration was achieved with custom Python scripts and Circos88. Pythonsuggested: (IPython, RRID:SCR_001658)The statistical, dimensionality reduction, ML and feature importance analyses were performed with in-house developed software based on Python’s scipy and scikit-learn as … SciScore for 10.1101/2022.05.10.22274905: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Data catalogues derived from three public TCR databases with immunogenetic information were downloaded to investigate the MIRA dataset’s TCR involvement in immune response during other infections. MIRAsuggested: (MIRA, RRID:SCR_010731)All protein sequences were downloaded from UniProt87 and the cross-reactivity exploration was achieved with custom Python scripts and Circos88. Pythonsuggested: (IPython, RRID:SCR_001658)The statistical, dimensionality reduction, ML and feature importance analyses were performed with in-house developed software based on Python’s scipy and scikit-learn as well as R’s dplyr, plyr, GLDEX, TSDT, stats, stringr and ggplot2 libraries. Python’ssuggested: (PyMVPA, RRID:SCR_006099)scipysuggested: (SciPy, RRID:SCR_008058)scikit-learnsuggested: (scikit-learn, RRID:SCR_002577)ggplot2suggested: (ggplot2, RRID:SCR_014601)The antigenic epitopes were aligned on reference sequences using Clustal algorithm and the 3-dimensional structure was generated with Jmol, within Jalview software based on 1AA7 (A and B chain view) and 6×29 (A chain view) Protein Data Bank89 entries for M1 and surface glycoprotein, respectively. Jmolsuggested: (Jmol, RRID:SCR_003796)Jalviewsuggested: (Jalview, RRID:SCR_006459)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:However, there are several limitations related to this study. The MIRA dataset includes a very limited number of samples associated with COVID-19-acute, COVID-19-non-acute and COVID-19-exposed cohorts, thus prohibiting any statistical or ML analysis to potentially connect TCR profile irregularities with disease severity. We did however manage to make statistically significant assumptions using samples derived from COVID-19-convalescent and healthy cohorts, although ideally the number of these samples should be higher. Another limitation relates to the availability of HLA allele information connecting TCR clonotypes and epitopes. The cross-reactivity analysis focused only on the CDR3 amino acid sequence comparison between MIRA dataset and other databases, due to the unavailability of the relevant HLA information for most TCR clonotypes. Therefore, it should be noted that the cross-reactions described here could only take place in individuals with specific HLA alleles, enabling the presentation of relative epitopes to potential long-lived memory T cells developed during previous infection and/or vaccination. Additionally, the extent of cross-reactivity is delimited by the inherent data bias in McPAS, TCR3d and VDJdb, stemming from the scientific community’s focus on specific pathogenic cases. We believe that our study provides a novel ML-based computation framework for analysing TCR-Seq datasets and systematically highlights the breadth and depth of “cross-talk” between antigen...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-