TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The paper by Mayer-Blackwell et al. builds on the idea that TCRs to the same antigen often share sequence similarities, which they quantify using a bespoke tool TCRdist3. Using this tool they develop the idea of a metaclone, a set of TCRs sharing similarities and potentially recognising the same antigen. They further show that such clonotypes show increased sharing between HLA-related individuals, and explore the use of such clonotypes in characterising antigen-specific immune response across cohorts of individuals. The paper is novel and of interest to a broad range of immunologists interested in repertoire. However, as raised in a series of detailed comments by the reviewers, the message of the paper needs to be sharpened and clearly focused on the development of the metaclone concept (with less emphasis on SARS-Cov-2. The concept of the metaclone itself, which is the main message of the paper, needs to be clarified, and characterised.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their names with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

T-cell receptors (TCRs) encode clinically valuable information that reflects prior antigen exposure and potential future response. However, despite advances in deep repertoire sequencing, enormous TCR diversity complicates the use of TCR clonotypes as clinical biomarkers. We propose a new framework that leverages experimentally inferred antigen-associated TCRs to form meta-clonotypes – groups of biochemically similar TCRs – that can be used to robustly quantify functionally similar TCRs in bulk repertoires across individuals. We apply the framework to TCR data from COVID-19 patients, generating 1831 public TCR meta-clonotypes from the SARS-CoV-2 antigen-associated TCRs that have strong evidence of restriction to patients with a specific human leukocyte antigen (HLA) genotype. Applied to independent cohorts, meta-clonotypes targeting these specific epitopes were more frequently detected in bulk repertoires compared to exact amino acid matches, and 59.7% (1093/1831) were more abundant among COVID-19 patients that expressed the putative restricting HLA allele (false discovery rate [FDR]<0.01), demonstrating the potential utility of meta-clonotypes as antigen-specific features for biomarker development. To enable further applications, we developed an open-source software package, tcrdist3, that implements this framework and facilitates flexible workflows for distance-based TCR repertoire analysis.

Article activity feed

  1. Author Response:

    Reviewer #1:

    The paper by -Blackwell et al. develops the ideas developed in the influential paper by Dash et al. (2017) which defined a similarity matrix for CDR3s TCRdist which is based on a weighted combination of local and global similarity measurements. In this paper they use the metric to develop the idea of a meta-clonotype, a set of similar TCRs which enrich for TCRs directed at the same antigen. They demonstrate that these meta-clonotypes show greater publicity than individual clonotypes, and show evidence of HLA-restriction. The authors speculate that the metaclonotype may be a useful biomarker. They provide open-access software tools for defining meta-clonotypes in antigen enriched repertoires.

    The major findings are: (1) Meta-clonotypes are more public than clonotypes, a result which seems not unexpected, given that meta-clonotypes include many different sequences; (2) Meta-clonotypes show evidence of HLA restriction, again predicted given the well-established fact that specific antigens can be recognised by sets of similar TCRs.

    The concept of a metaclonotype is an interesting one which could have widespread use in analysis of TCR repertoire. However, the impact could be much greater, by sharpening the focus of the paper, and adding detail and clairty to the idea of teh clonotype. In particular, while the introduction correctly points out that prediction of SARS-Cov_2 clinical outcome, or better understanding of the role of coronavirus prior exposure in determining outcome are important unanswered questions, this paper does not address these questions.

    Thank you for your careful review of the manuscript. We have submitted a major revision with greater focus on the definition of a TCR meta-clonotype. We have removed from the introduction much of the background about SARS-CoV-2 and potential implications for the pandemic. In its place we’ve added greater detail about meta-clonotypes, how they can be defined from antigen-enriched TCR data, and how they can be used to analyze bulk TCR sequencing data.

    A substantial portion of the paper is devoted to analysing data obtained using the MIRA assay (Klinger et al PLoS One 10 :e0141561) to define SARS-COV-2 responses, and it is not always clear whether the objective is to evaluate the accuracy of this data set, or to test the power of the meta-clonotype approach.

    Our objective with the analyses of the IMMUNEcode dataset (Nolan et al. 2020) dataset, using MIRA method from Klinger et al. 2015, is to demonstrate that TCR meta-clonotypes can be defined from antigen-enriched TCR data and that they can be used to identify and quantify antigen-specific TCRs in bulk sequenced data. Furthermore, the analysis provides evidence that meta-clonotypes have greater publicity than individual clonotypes, thus increasing sensitivity of detection for antigen-specific TCRs. Using bulk repertoires from COVID-19 patients we then demonstrated that population-level analysis can be made possible using meta-clonotypes and provided supporting evidence that the antigen specificity of the centroid TCR is retained. These analyses and their interpretation is further revised on lines 319-348. We think that in the process of evaluating meta-clonotypes, our analysis also shows that the publicly released data contains valuable information about SARS-CoV-2 TCR specificities; however, we have not systematically attempted to verify the validity of the dataset. In the current revision these objectives are made clear in the revised Introduction section.

    Reviewer #2:

    Summary of main aims: The main aim of this paper is to build a framework for TCR meta-clonotypes for finding similar TCRs across individuals (or different repertoires). The majority of the investigations performed in this work have the objective of showing the data properties of meta-clonotypes as well as the metaclonotypes' usefulness for the analysis of antigen-specific TCR data and disease-labeled immune repertoire data.

    Major strengths: Building meta clonotypes is a possible path towards a better coverage of immune repertoire biology as well as inter-individual repertoire comparison. TCRdist3 is an efficient method for building meta-clonotypes that enables the study of the specific characteristics of meta-clonotypes. So, far clusters of similar sequences have not been investigated in depth. The author team is making a significant step forward in this direction by characterizing meta clonotypes in differentially antigen-specific-clone-enriched repertoires and by relating the results to generation probability, HLA, sex and immune status.

    Major weaknesses: Although the authors show a significant amount of data, I am not sure if these data convey sufficient intuition about the characteristics and behavior of meta-clonotypes. The authors seem too focused on relating meta-clonotypes to immune status instead of focusing on the specific biological characteristics of metaclonotypes.

    We agree and have shifted the focus of the manuscript away from the results of the application and towards providing greater detail about the characteristics and behaviors of meta-clonotypes; for example, we’ve removed much of the background about SARS-CoV-2 and the COVID-19 cohorts and we’ve added details about how the meta-clonotype radius can be optimized. We’ve also reframed the data analysis section to emphasize that the results demonstrate how meta-clonotypes carry important antigen-specific signals above and beyond individual clonotypes; this makes the results valuable beyond the application to SARS-CoV-2. For example, while demonstrating HLA restriction occurs in SARS-CoV-2 specific T cell responses in COVID-19 patients is not a surprising finding, it provides evidence that meta-clonotypes enable quantification of an antigen-specific and HLA-restricted T cell response from a bulk single-chain TCR repertoire. We use this example analysis to compare the strength of this signal in individual clonotypes, meta-clonotypes with radius alone and meta-clonotypes with a motif constraint. The revision of lines 98-100 and lines 461-470 provide clarity about this motivation for the analysis and interpretation of the results.

    Furthermore, we have added a section to the Results that demonstrates how meta-clonotypes and tcrdist3 enable analyses that can provide biological insights about the biochemical properties that may confer antigen specificity (lines 383-418, Figure 10). Since meta-clonotypes define groups of sequences, we can use CDR3 logo plots to dissect how positions and amino acid properties in the CDR3 define the group. In the revision we demonstrate a “background-adjusted” logo plot, that is able to emphasize amino acids that define the meta-clonotype, yet are uncommon among TCRs using the same V and J genes. Visualizing the results in this way can generate hypotheses that can be experimentally validated about the amino acids that are essential for antigen recognition.

    Furthermore, the authors fail to convincingly show that the background repertoires chosen for meta clonotypes are robust and to what extent meta-clonotypes are sensitive to changes in the background repertoire.

    We agree that it is important to understand how the creation of meta-clonotypes, and specifically optimization of the radius, depends on the background repertoire. Therefore, we conducted sensitivity analyses varying the size (25K to 1M) and makeup (synthetic OLGA vs. cord blood vs. a blend) of the background (lines 259-294). We also empirically demonstrate the value of over-sampling background TCRs with matching V and J genes. We show that using a background of 200,000 TCRs was sufficient for reducing the bias and variability in selecting a meta-clonotype radius, compared to a reference set of 2 M background TCRs; this is important because while it is tractable to use large backgrounds for a small number of meta-clonotypes, for larger studies or analyses confined to a laptop, the smaller background set is sufficient; we also show that this can largely be attributed to the gain in efficiency that comes with using a background that includes synthetic OLGA TCRs with VJ-gene frequencies that match the TCRs included in the meta-clonotype. However, we note that “Ultimately, the best choice for the background may depend on the question being asked and the data that is available, with factors including donor HLA, age, potential antigen exposures, and other factors that may shape the repertoire.” Our goal with tcrdist3 was to make it easy for the user to customize the background to the scientific question.

    The authors also do not convincingly differentiate themselves from previous approaches that have used network analysis and generation probability in order to find clusters of similar sequences (very much conceptually similar to the approach taken here).

    We agree it’s important to communicate how meta-clonotypes differ from existing TCR analysis approaches. There are several important distinctions with the existing methods that use networks and generation probability, namely TCRNET and ALICE. We have highlighted these differences in the Discussion (lines 483-497), quoting:

    “The meta-clonotype approach also differs from methods, such as TCRNET (Ritvo et al., 2018) and ALICE (Pogorelyy et al., 2019), that seek to identify TCRs sharing antigen-specificity within bulk repertoires. These methods were developed to identify TCR nodes in a network with an enriched number of edges compared to the expected number of edges in a background (TCRNET) or derived from a probability model (ALICE). Similarly, another recent method attempts to find antigen-associated sequences in bulk repertoires using a two-stage agglomerative clustering of a k-mer based representation of CDRs, first within and then across bulk repertoires (Yohannes et al., 2021). Our framework is designed for a different task than these algorithms. Specifically, we sought to construct definitions of TCR groupings among already antigen-associated TCRs, which would have high sensitivity and specificity for finding similar TCRs in bulk repertoires. This is an important distinction because the existing network-enrichment methods would simply find that most or all of the TCR groupings among a set of antigen-associated sequences were statistically enriched compared with their frequency in antigen-naïve repertoires. By contrast, a flexible meta-clonotype radius permits the definition of the largest possible group of antigen-associated TCRs with the constraint that the likelihood of finding a TCR within the radius in an antigen-naïve background is equally low across all meta-clonotypes.”

    Finally, the authors do not provide detailed descriptions of how the comparison of meta-clonotypes across repertoires is handled as well as potential sequence redundancies across meta-clonotypes (in potentially different individuals). I believe that all of the perceived shortcomings are readily addressable in a revision.

    You are quite right that many meta-clonotypes are overlapping in that a single TCR might conform to more than one meta-clonotype definition. Thus, in the application of meta-clonotypes to the COVID-19 dataset we tested each meta-clonotype individually for an association with the predicted HLA- restricting genotype. Depending on the context, if a summary across meta-clonotypes is required (e.g. finding the overall abundance of conformant TCRs in a repertoire) it may be appropriate to use meta- clonotypes to identify conformant sequences, but then tally them based on actual abundance (i.e. no double counting). In a prediction context, it may be desirable to have overlapping meta-clonotype features, and in fact many machine learning algorithms excel in this regime. With tcrdist3 we have incorporated a “join” functionality that allows for relational database-style joining of meta-clonotypes with a TCR repertoire; this makes it relatively easy to eliminate or keep redundancies, depending on the context. We have added a sentence to the Discussion pointing out that there is overlap among meta-clonotypes that needs to be considered in their application and we provide a link to an example of how to use the join functionality on https://tcrdist3.readthedocs.io/en/latest/join.html#step-by-step-example.

    All in all, this manuscript is an important steps towards a better understanding of immune receptor biology. tcrdist3 is an evolution of a previously published method (tcrdist) that is here used to build meta-clonotypes. After reading the paper, it remains slightly unclear (addressable in a revision) as to how useful they are for understanding repertoire biology as well as how to use them in practice in terms of robustness and sensitivity.

    Thank you for your constructive comments, and I hope we’ve addressed these issues around biological interpretability and application in the revision.

    Reviewer #3:

    Mayer-Blackwell et al introduce a new framework for leveraging antigen-annotated T cell receptor (TCR) sequencing data to search for similar TCRs in bulk repertoire data, which potentially recognise the same antigen peptide. They introduce the notion of meta-clonotype, a T cell receptor (TCR) feature consisting of a main TCR sequence ("centroid") and a distance radius around it (+/- a CDR3 motif), with distance measured according to their previously published TCRdist method (Dash et al, 2017). The meta-clonotypes benefit from increased publicity over exact clonotype matching, and enhance the ability to find potentially relevant TCRs in repertoires from unrelated individuals, which are usually highly diverse, predominantly private, and subject to sampling constraints. The idea of meta-clonotypes is very interesting, and will provide a very useful tool in future repertoire analyses. For example, public databases of annotated TCRs (e.g. VDJdb) can be used to derive the set of meta-clonotypes for a variety of antigens, which can in turn be searched for in bulk repertoire data to identify e.g. memory to previous antigen exposure, immune status etc.

    The tool for performing the analysis, tcrdist3, is open-source, well-documented with instructions and examples, and the statistical analysis has been well-thought out. It is also useful to have the comparison to the current alternative of k-mer based TCR distance (i.e. GLIPH2), and the added flexibility for the user to define the precise distance metric to be used in the tcrdist3 tool.

    The authors then apply their method to analyse TCR beta sequences from COVID-19 datasets that have been publicly released by Adaptive Biotechnologies through the immuneRACE project. They use the MIRA set, the peptide-enriched set, to identify the meta-clonotypes, and then search for these in an independent cohort of COVID-19 bulk repertoires from 694 individuals. The authors find that a large proportion of the meta-clonotypes were more abundant in patients expressing the relevant restricting HLA allele, and suggest this could potentially lead to the development of disease biomarkers. The set of sars-cov-2 related meta-clonotypes is a useful resource in itself, as researchers generating other COVID-19 TCR datasets will be able to utilize this set of meta-clonotypes to search and potentially stratify patients in their own generated data.

    There are a few areas were further detail / examples would strengthen the paper's claims, in particular in the application of the tcrdist3 method to the COVID-19 data.

    1. Bulk TCR data from repertoires with past antigen exposure are likely to contain varying sizes of clones due to the proliferation of responding T cells and a remaining memory population. Due to the sharp drop in size between a TCR sequencing sample and the entire repertoire, clones above a particular size relative to the sample size are highly unlikely to have been sampled by chance, and identifying significantly/meaningfully expanded clonotypes in a sample is often used to identify a potentially antigen-recognising set of TCRs. The authors demonstrate the detection of meta-clonotypes in the repertoire sets, but it is somewhat unclear how the abundance of a clonotype conforming to a particular meta-clonotype is addressed. For example, there may be rationale for treating the following cases differently: meta-clonotype A is instantiated by (i) a unique clonotype with abundance 1; (ii) a single clonotype with abundance 1000; (iii) 100 different clonotypes (i.e. a "dense neighbourhood" around this meta-clonotype). If used to develop biomarkers, perhaps some degree of granularity in how the frequency/occurrence of meta-clonotypes is calculated would be helpful here.

    Thanks for this helpful suggestion. We agree that the scenarios you outlined above, which differ in the level of clonal breadth (i.e. number of unique clones), may have great immunological relevance. Though we have not specifically assessed the clinical or immunological relevance of clonal breadth vs. clonal frequency, we have noted in the revision (lines 514-519) that there are multiple ways of counting meta-clonotype conformant sequences and multiple ways of aggregating counts across meta-clonotypes, for example without double counting clones that may be conformant to multiple meta-clonotypes. We have also added a documentation page about how to tabulate abundance or breadth of conforming clones: https://tcrdist3.readthedocs.io/en/latest/join.html

    1. The authors focus their analysis on detecting meta-clonotypes from MIRA sets with strong evidence of HLA-restriction. They report 59.7% of these meta-clonotypes were more abundant in patients expressing the corresponding HLA allele. This means that over 40% of meta-clonotypes with strong HLA restriction were more abundant in repertoires with other HLA types. This point could be further elucidated by comparing results with the control repertoires from the COVID-19 set, from MIRA sets with low evidence of HLA restriction, or combining the sets of low and high evidence of HLA restriction (i.e. HLA agnostic results).

    We’d like to clarify that the results do not imply that 40% of meta-clonotypes were more abundant in participants lacking the restricting HLA allele; rather, these meta-clonotypes did not have a significant association with presence or absence of the HLA genotype. In the discussion we highlight several of the possible explanations for this including that meta-clonotypes were too rarely detected in the population (lines 470-478). The volcano plots in Figure 6A and 7A show that there are very few if any HLA associations of the opposite sign (i.e. meta-clonotypes more abundant in patients without the restricting HLA allele). In fact, at the chosen significance threshold (FDR <0.01), 0 of 1831 predicted HLA-associated meta-clonotype were significantly negatively associated with the predicted HLA.

    1. The MIRA55 set is used as an illustrative example throughout the manuscript, which familiarises the reader with this dataset as they are reading the paper. However, the claims made by the paper about MIRA sets / strong HLA evidence MIRA sets could be strengthened by providing an indication of how measured characteristics of the MIRA55 set compare to the other sets being assessed.

    This is a good point and we have tried to provide as much information about MIRA55 and the other MIRA sets to help establish that MIRA55 is a representative set. Characteristics of the other MIRA sets appear in Supporting Table S6, including:

    • Input number of clonotypes (AA exact)
    • Number of non-redundant, public meta-clonotypes
    • Clonotypes spanned by at least one meta-clonotype
    • Span (% of clonotypes conforming to a meta-clonotype definition)
    • Number of public enhanced sequences that match an identified TCRβ

    As well as summary statistics for other meta-clonotype properties:

    • Pgen
    • Radius (TCRdist units)
    • TRBV-CDR length
    • Number of MIRA subjects contributing at least 1 sequence

    Furthermore, Table S7 provides the strength of evidence for HLA-restriction for each meta-clonotype, which is then summarized by MIRA set visually in Figure 4.

    Based on these criteria, we think MIRA 55 is a reasonably representative set to focus on.

    1. There is some discussion throughout the manuscript about using the sars-cov-2 meta-clonotypes to identify differing clinical outcomes such as disease severity. Perhaps the dataset does not have sufficient power to allow for such sub-analysis, but a method of using meta-clonotypes to differentiate between patients based on the occurrence of meta-clonotypes in their repertoire is not provided [e.g. the number of observed clonotypes, the density distribution around clonotypes etc.)

    That is true. With this manuscript we have tried to focus on establishing the methodology, evaluating the strength of the antigen-specific signal and demonstrating its potential applications; we have tried to make these goals more explicit throughout. Specifically in the revision we note that: “Much like any biomarker study, to establish a TCR-based predictor of a particular outcome, the features must be measured among a sufficiently large cohort of individuals, with a sufficient mix of outcomes.” At this time the publicly available ImmuneRACE data lack negative controls and sufficient clinical details to allow for building a predictor of SARS-CoV-2 infection or disease severity.

  2. Evaluation Summary:

    The paper by Mayer-Blackwell et al. builds on the idea that TCRs to the same antigen often share sequence similarities, which they quantify using a bespoke tool TCRdist3. Using this tool they develop the idea of a metaclone, a set of TCRs sharing similarities and potentially recognising the same antigen. They further show that such clonotypes show increased sharing between HLA-related individuals, and explore the use of such clonotypes in characterising antigen-specific immune response across cohorts of individuals. The paper is novel and of interest to a broad range of immunologists interested in repertoire. However, as raised in a series of detailed comments by the reviewers, the message of the paper needs to be sharpened and clearly focused on the development of the metaclone concept (with less emphasis on SARS-Cov-2. The concept of the metaclone itself, which is the main message of the paper, needs to be clarified, and characterised.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their names with the authors.)

  3. Reviewer #1 (Public Review):

    The paper by -Blackwell et al. develops the ideas developed in the influential paper by Dash et al. (2017) which defined a similarity matrix for CDR3s TCRdist which is based on a weighted combination of local and global similarity measurements. In this paper they use the metric to develop the idea of a meta-clonotype, a set of similar TCRs which enrich for TCRs directed at the same antigen. They demonstrate that these meta-clonotypes show greater publicity than individual clonotypes, and show evidence of HLA-restriction. The authors speculate that the metaclonotype may be a useful biomarker. They provide open-access software tools for defining meta-clonotypes in antigen enriched repertoires.

    The major findings are: (1) Meta-clonotypes are more public than clonotypes, a result which seems not unexpected, given that meta-clonotypes include many different sequences; (2) Meta-clonotypes show evidence of HLA restriction, again predicted given the well-established fact that specific antigens can be recognised by sets of similar TCRs.

    The concept of a metaclonotype is an interesting one which could have widespread use in analysis of TCR repertoire. However, the impact could be much greater, by sharpening the focus of the paper, and adding detail and clairty to the idea of teh clonotype. In particular, while the introduction correctly points out that prediction of SARS-Cov_2 clinical outcome, or better understanding of the role of coronavirus prior exposure in determining outcome are important unanswered questions, this paper does not address these questions.

    A substantial portion of the paper is devoted to analysing data obtained using the MIRA assay (Klinger et al PLoS One 10 :e0141561) to define SARS-COV-2 responses, and it is not always clear whether the objective is to evaluate the accuracy of this data set, or to test the power of the meta-clonotype approach.

  4. Reviewer #2 (Public Review):

    Summary of main aims: The main aim of this paper is to build a framework for TCR meta-clonotypes for finding similar TCRs across individuals (or different repertoires). The majority of the investigations performed in this work have the objective of showing the data properties of meta-clonotypes as well as the metaclonotypes' usefulness for the analysis of antigen-specific TCR data and disease-labeled immune repertoire data.

    Major strengths: Building meta clonotypes is a possible path towards a better coverage of immune repertoire biology as well as inter-individual repertoire comparison. TCRdist3 is an efficient method for building meta-clonotypes that enables the study of the specific characteristics of meta-clonotypes. So, far clusters of similar sequences have not been investigated in depth. The author team is making a significant step forward in this direction by characterizing meta clonotypes in differentially antigen-specific-clone-enriched repertoires and by relating the results to generation probability, HLA, sex and immune status.

    Major weaknesses: Although the authors show a significant amount of data, I am not sure if these data convey sufficient intuition about the characteristics and behavior of meta-clonotypes. The authors seem too focused on relating meta-clonotypes to immune status instead of focusing on the specific biological characteristics of metaclonotypes. Furthermore, the authors fail to convincingly show that the background repertoires chosen for meta clonotypes are robust and to what extent meta-clonotypes are sensitive to changes in the background repertoire. The authors also do not convincingly differentiate themselves from previous approaches that have used network analysis and generation probability in order to find clusters of similar sequences (very much conceptually similar to the approach taken here). Finally, the authors do not provide detailed descriptions of how the comparison of meta-clonotypes across repertoires is handled as well as potential sequence redundancies across meta-clonotypes (in potentially different individuals). I believe that all of the perceived shortcomings are readily addressable in a revision.

    All in all, this manuscript is an important steps towards a better understanding of immune receptor biology. tcrdist3 is an evolution of a previously published method (tcrdist) that is here used to build meta-clonotypes. After reading the paper, it remains slightly unclear (addressable in a revision) as to how useful they are for understanding repertoire biology as well as how to use them in practice in terms of robustness and sensitivity.

  5. Reviewer #3 (Public Review):

    Mayer-Blackwell et al introduce a new framework for leveraging antigen-annotated T cell receptor (TCR) sequencing data to search for similar TCRs in bulk repertoire data, which potentially recognise the same antigen peptide. They introduce the notion of meta-clonotype, a T cell receptor (TCR) feature consisting of a main TCR sequence ("centroid") and a distance radius around it (+/- a CDR3 motif), with distance measured according to their previously published TCRdist method (Dash et al, 2017). The meta-clonotypes benefit from increased publicity over exact clonotype matching, and enhance the ability to find potentially relevant TCRs in repertoires from unrelated individuals, which are usually highly diverse, predominantly private, and subject to sampling constraints. The idea of meta-clonotypes is very interesting, and will provide a very useful tool in future repertoire analyses. For example, public databases of annotated TCRs (e.g. VDJdb) can be used to derive the set of meta-clonotypes for a variety of antigens, which can in turn be searched for in bulk repertoire data to identify e.g. memory to previous antigen exposure, immune status etc.

    The tool for performing the analysis, tcrdist3, is open-source, well-documented with instructions and examples, and the statistical analysis has been well-thought out. It is also useful to have the comparison to the current alternative of k-mer based TCR distance (i.e. GLIPH2), and the added flexibility for the user to define the precise distance metric to be used in the tcrdist3 tool.

    The authors then apply their method to analyse TCR beta sequences from COVID-19 datasets that have been publicly released by Adaptive Biotechnologies through the immuneRACE project. They use the MIRA set, the peptide-enriched set, to identify the meta-clonotypes, and then search for these in an independent cohort of COVID-19 bulk repertoires from 694 individuals. The authors find that a large proportion of the meta-clonotypes were more abundant in patients expressing the relevant restricting HLA allele, and suggest this could potentially lead to the development of disease biomarkers. The set of sars-cov-2 related meta-clonotypes is a useful resource in itself, as researchers generating other COVID-19 TCR datasets will be able to utilize this set of meta-clonotypes to search and potentially stratify patients in their own generated data.

    There are a few areas were further detail / examples would strengthen the paper's claims, in particular in the application of the tcrdist3 method to the COVID-19 data.

    1. Bulk TCR data from repertoires with past antigen exposure are likely to contain varying sizes of clones due to the proliferation of responding T cells and a remaining memory population. Due to the sharp drop in size between a TCR sequencing sample and the entire repertoire, clones above a particular size relative to the sample size are highly unlikely to have been sampled by chance, and identifying significantly/meaningfully expanded clonotypes in a sample is often used to identify a potentially antigen-recognising set of TCRs. The authors demonstrate the detection of meta-clonotypes in the repertoire sets, but it is somewhat unclear how the abundance of a clonotype conforming to a particular meta-clonotype is addressed. For example, there may be rationale for treating the following cases differently: meta-clonotype A is instantiated by (i) a unique clonotype with abundance 1; (ii) a single clonotype with abundance 1000; (iii) 100 different clonotypes (i.e. a "dense neighbourhood" around this meta-clonotype). If used to develop biomarkers, perhaps some degree of granularity in how the frequency/occurrence of meta-clonotypes is calculated would be helpful here.

    2. The authors focus their analysis on detecting meta-clonotypes from MIRA sets with strong evidence of HLA-restriction. They report 59.7% of these meta-clonotypes were more abundant in patients expressing the corresponding HLA allele. This means that over 40% of meta-clonotypes with strong HLA restriction were more abundant in repertoires with other HLA types. This point could be further elucidated by comparing results with the control repertoires from the COVID-19 set, from MIRA sets with low evidence of HLA restriction, or combining the sets of low and high evidence of HLA restriction (i.e. HLA agnostic results).

    3. The MIRA55 set is used as an illustrative example throughout the manuscript, which familiarises the reader with this dataset as they are reading the paper. However, the claims made by the paper about MIRA sets / strong HLA evidence MIRA sets could be strengthened by providing an indication of how measured characteristics of the MIRA55 set compare to the other sets being assessed.

    4. There is some discussion throughout the manuscript about using the sars-cov-2 meta-clonotypes to identify differing clinical outcomes such as disease severity. Perhaps the dataset does not have sufficient power to allow for such sub-analysis, but a method of using meta-clonotypes to differentiate between patients based on the occurrence of meta-clonotypes in their repertoire is not provided [e.g. the number of observed clonotypes, the density distribution around clonotypes etc.)

  6. SciScore for 10.1101/2020.12.24.424260: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    TCR distances: Weighted multi-CDR distances between TCRs were computed in a tcrdist3, a open-source Python3 package for TCR repertoire analysis and visualization, using the procedure first described in (Dash et al., 2017).
    Python3
    suggested: None
    Thus we initially attempted to fit a negative binomial model to the data (e.g. DESEQ2 (Love et al., 2013)).
    DESEQ2
    suggested: (DESeq2, RRID:SCR_015687)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.