Preexisting memory CD4 T cells in naïve individuals confer robust immunity upon hepatitis B vaccination

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    By using modern high-throughput sequencing this paper demonstrates the antibody mediated immune responses that are elicited by vaccination are improved by pre-existing memory CD4 T cell responses. Moreover, the experimental data are an important contribution and may also be useful as a data resource for future research. All reviewers agree that the findings are of great interest. However, there are still some clarifications needed in statistical analytical and validations so they convincingly support the conclusions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Antigen recognition through the T cell receptor (TCR) αβ heterodimer is one of the primary determinants of the adaptive immune response. Vaccines activate naïve T cells with high specificity to expand and differentiate into memory T cells. However, antigen-specific memory CD4 T cells exist in unexposed antigen-naïve hosts. In this study, we use high-throughput sequencing of memory CD4 TCRβ repertoire and machine learning to show that individuals with preexisting vaccine-reactive memory CD4 T cell clonotypes elicited earlier and higher antibody titers and mounted a more robust CD4 T cell response to hepatitis B vaccine. In addition, integration of TCRβ sequence patterns into a hepatitis B epitope-specific annotation model can predict which individuals will have an early and more vigorous vaccine-elicited immunity. Thus, the presence of preexisting memory T cell clonotypes has a significant impact on immunity and can be used to predict immune responses to vaccination.

Article activity feed

  1. Author Response:

    Reviewer #1:

    George Elias et al investigated the response of a cohort of individuals to Hepatitis B vaccination and analysed the role of preexisting vaccine-reactive CD4+ memory T cell receptors in the immune response. They found that the presence of these cross-reactive receptors elicits a faster and stronger response in the vaccinees. This is an extremely interesting result, as it suggests that a better understanding of the immune receptor repertoire of an individual can be used to predict and analyse its response to vaccination.

    Strengths:

    The study presents a detailed experimental analysis of the role of CD4+ T cells in the immune response to vaccination.

    The authors show clearly that the dynamics of expansion of memory CD4+ vaccine-specific clones follows the immune response, corroborating the results of previous studies that analysed effector CD4+ cells.

    The authors asked also whether the presence of preexisting vaccine-specific clones impacts the response to vaccination. They found that this is the case. They defined an estimator of a normalized number of putative vaccine-specific clones and showed that can be used to classify individuals into early or late responders. This result has the potential to be extremely impactful in the way we understand immune response to vaccination.

    We thank the reviewer for their kind comments.

    Weaknesses:

    This central result follows the definition of the R_{hbs} measure. It is not completely clear how much the numerator and denominator of R_{hbs} contribute to the results and how those bystander and putative receptor sequences have been chosen. Some additional explanations could help reinforce the trust to this specific analysis.

    As indicated by this and other reviewers, the original definition R_{hbs} measure was confusing for readers. We have attempted to clarify the definition, through changes in the methods, results, figure legends and code base.

    In order to specifically address this comment, we have extended the discussion on what makes up the numerator and denominator, and how each contribute to the final metric (which is visualized in figure 3b).

    The new text reads:

    "Rhbs is the ratio of the frequency of putative peptide-specific TCRβ divided by a normalization term for putative false positive predictions due to bystander activations in the training data set. This model applied to the memory repertoire at day 60 shows that early-converters tend to have a higher frequency of putative HBsAg peptide-specific TCRβ, while late-converters tend to have relatively more putative false positives as per the normalization term (Fig. 3b)."

    It is also not clear if multiple testing correction has been performed in the presentation of the results of Fig5.

    Multiple testing correction is implemented throughout the manuscript. This has now been clarified in the manuscript text.

    The correlation between the number of putative vaccine-reactive CD4+ T cells at day 60 and antibody titers is an interesting and robust result. This however does not support the claim of the authors that preexisting vaccine-specific CD4+ memory cells are associated to stronger immune response. This could be the case only if a similar correlation would be observed at day 0.

    The reviewer raises an important point. The relationship between the vaccine-reactive CD4+ T-cells and the antibody titer is already a very interesting finding. However, we do already confirm that this signal is also present at day 0 in the CD4+ T-cell memory repertoire. Thus, our results seem to indicate the presence of preexisting vaccine-specific CD4+ memory cells.

    We have clarified this section in the manuscript in the hopes of avoiding similar confusion in other readers. The new section reads:

    "Furthermore, searching for HBsAg peptide-specific clonotypes in the memory repertoires prior to vaccination (day 0) results in a Rhbs with a similar difference (one-sided Wilcoxon-test P value= 0.0010, Fig. 3d). In this manner, the presence of HBsAg peptide-specific clonotypes as represented by the ratio Rhbs can be used as a classifier to distinguish early from late-converters prior to vaccination (Fig. 3e), with an AUC of 0.825 (95% CI: 0.657 – 0.994) in a leave-one-out cross validation setting."

    Reviewer #3:

    This manuscript presents a comprehensive study of CD4 memory T cell receptor beta repertoire response to hepatitis B vaccination, including repertoire correlates of early, late, and non seroconversion, identification of antigen specific and epitope specific clones, and a statistical classifier to potentially predict early Vs late seroconverters based on their pre-vaccination bulk repertoire. The major strengths are a unified experimental and computational analysis of bulk TCR repertoire data with antigen and epitope specific sorted T cells from the same individuals, allowing them to track personalized dynamics of vaccine specific clones, as well as translate across individuals to predict vaccine-induced seroconversion outcomes from pre-vaccination repertoires. The experimental data and reproducible analysis code are publicly accessible, and represent a useful resource that will likely be of interest beyond this study to other immune repertoire researchers.

    The results seem to support the authors conclusions, however several reported findings based on statistical analysis are less convincing, and would benefit from improved validation, clarification, or reworking. I next detail these aspects ordered by results sections.

    Section beginning line 128:

    The reported finding of this section is that early-converters (and not late-converters) undergo repertoire remodeling by day 60 post vaccination that decreases repertoire clonality. The evidence presented to support this is a computation of Shannon entropy for day 60 Vs day 0 in each individual, and a paired sample statistical test that is nominally significant for early and not for late converters. However, this nominally significant p value 0.042 is quite marginal, and the associated plots (Fig 2a) indicate only a very modest visual difference, and the presence of a distant outlier. The p value for late converters is not shown, however the marginally significant p value for early converters may not be nominally significant (at alpha 0.05) after multiple test correction (two tests). Additionally, the range of possible entropy values depends on the total sample size, so part of this difference may be driven by sample size. It may be more appropriate to use the Shannon equitability index (normalizing by the maximum possible entropy given the sample size, which is the log of the richness).

    We wish to thank the reviewer for the suggestion of using the Shannon equitability index, which we had not considered before and is indeed highly appropriate for the analysis. This analysis has therefore been rerun with the Shannon equitability index, which has indeed resolved the visual outliers that appeared before. As could be expected, the marginal result that was found before was not sufficient robust, and the P-value for the late converter increase in Shannon equitability index was now found to be 0.0822 with a Wilcoxon test. These results have therefore been removed and they had no impact on the main conclusions of the paper.

    Section beginning line 147:

    Ag specific T cells were isolated from day 60 samples and sequenced, allowing the authors to track the dynamics of these clones in the bulk repertoire data across time points. In all vaccinee groups these Ag specific clones are found to increase from day 0 to day 60 in the bulk repertoires. A marginal p value (0.04909) is presented to support early-converters showing more increase in these Ag specific clones. However, statistics comparing early to non or late to non converters are not mentioned (and these would require a multiple test correction on the p value that is discussed).

    This is a valid concern raised by the reviewer. This specific analysis as performed in the original study was lacking power, in large part due to the lack of a concrete null model. Due to this lack in power, we had opted not to include the non-converters in this analysis (as these are only three samples). However, thanks to this reviewer’s suggestion, we were able to rework this analysis with a novel null model (detailed in the next response). This section has therefore been removed and replaced with the new analysis (where a Bonferroni multiple testing correction has been applied for the three responder categories).

    A more general difficulty I have with this section is that the null hypothesis isn't made clear, and is probably more subtle and complicated than it appears. Cells are sorted from day 60, then their prevalence is compared between day 0 and day 60. Don't we expect to see more of them in day 60 even if there is no specific expansion for these clonotypes, but just random repertoire churn? My concern here is that double dipping from day 60 is affecting the analysis, since this time point is initially used to define the marker clones in the first place. If you take a random set of day 60 TCRs as null marker clones do you also see they are more prevalent in day 60 Vs day 0, or are you assuming that there should be no difference under the null?

    This is a valid point of criticism raised by the reviewer, and one that was not adequately explored in the previous version of the manuscript. In the prior version, the assumption was that there would be no difference under the null, despite the vaccine-specific clonotypes are derived from samples taken from day 60. While these do originate from a different cellular compartment (memory versus activated cells) and there are several weeks of stimulation experiments that separate the two, it can be argued that the clonotypes will always have more in common with the day 60 samples than the day 0 samples.

    The establishment of a null model is difficult in this sense. Selecting random clonotypes from day 60 as the reviewer suggests, would establish a baseline based on the overlap between the two time points. This would not be a good comparison as the impact of day 60 clonotypes would be overinflated (as no experimental steps separate it) and any increase in epitope-specific clonotypes could never exceed this value.

    Therefore, we have added additional experimental data to establish a null model against which to compare. Samples from day 60 were treated in identical manner as described before but in the presence of epitope lysates derived from the varicella zoster virus (VZV) instead of the hepatitis B surface antigen. The prevalence of VZV in our study population is near absolute, thus it can be expected that all individuals have built up a T-cell immunity against the virus. Moreover, as this is a childhood disease, one can expect that the T-cell immunity against varicella on average not to change between day 0 and day 60 in our cohort. We postulate that this additional experiment is perfectly suited to establish a null model for the vaccine-specific expansion.

    Using the VZV-specific clonotypes in the comparisons between day 60 and day 0 show a difference of 1.021 [95% CI: 0.934-1.124]. Thus our assumption that the null model should show no increase seems to be valid, as here too the clonotypes were derived from the day 60 samples. In addition, all reported increases were found to be highly significant when compared to the VZV-derived baseline (e.g. P-value of 6.3e-05 for the HbsAg-specific increase of 2.080).

    The last analysis in this section (presented in Fig 2d) does group-level comparisons of Ag specific clone fractions at day 60. I don't follow why normalizing by the number of Ag-specific clones detected in each individual is correct (i.e. would result in no differences under the null). Here again it could be helpful to see if null marker clone sets (the same size as the true Ag specific sets for each individual) indeed show no significant differences between groups.

    This is an important remark made by the reviewer. In essence, we need to compare the set of marker TCR clonotypes identified in the expansion experiment with the TCR clonotypes found in the CD4+ memory compartment. Thus, we are considering the overlap between two sets of TCR clonotypes for each individual.

    When one has two sets A and B, and wish to compare the overlap across different comparisons, one has to consider the impact of the size of A and B. The larger either A and B are, the larger the expected overlap (even by chance).

    As a specific example tuned to the problem we are addressing in this study: Imagine two donor D1 and D2. Each donor has a sequenced TCR repertoire with 50,000 unique clones (which is on par with the observed values). D1 has 10 Ag-specific clonotypes derived from the stimulation and D2 has 200 Ag-specific clonotypes. These number can vary widely due to sampling bias, differences in clonal expansion and difference in immunoprevalence or immunodominance. This is also not considering any bystander (false positive) clonotypes. In any case, consider that the overlap in both cases is 5. This means that we find half of the Ag-specific clonotypes in the memory repertoire of D1, but only less than 3% for D2. Thus despite having equal overlap, we would argue that the overlap for D1 is more relevant than the overlap for D2. Thus when comparing between individuals, we argue that one must take into account the size of the TCR sets that are being used for the overlap calculation.

    As our sets are not equal in size, i.e. |A| >> |B|, we applied the Szymkiewicz–Simpson coefficient (also known as the Overlap Coefficient), wherein one divides by the size of the smallest set. In our case, the smallest set is always the set of the Ag-specific marker TCR clonotypes. Therefore, in practice, we always normalize by the number of Ag-specific clones.

    This has now been clarified in the paper and the new text reads:

    "In this case, to allow for a between-vaccinees comparison (in contrast to the within-vaccinees timepoint comparison), we calculate the Overlap Coefficient, where HBsAg-specific sequences in the CD4 T-cell memory repertoire are normalized by the number of HBsAg-specific TCRβ found for each vaccinee."

    Section beginning line 185:

    In this section, a peptide pool approach is used to identify epitope specific TCRs from each individual at day 60, and a classifier is constructed to discriminate between early and late converter bulk repertoires, using a quantity R_hbs that measures the relative fraction of peptide specific TCRs in the repertoire according to Hamming distance similarity to the peptide specific TCRs. Importantly (as stated in the methods) a cross validation procedure is employed where TCRs from a given individual are not used for classification of that same individual. Since d is Hamming distance on CDR3 sequences, presumably comparisons are only made for TCRs with identical CDR3 length differences. This seems like a limitation, since clones with identical V and J gene, and CDR3 that differ by only one in CDR3 length could very well bind the same epitope. A more TCR-specific distance function, such as the TCRdist of Dash et al., may significantly increase classifier performance.

    The suggestion raised by the reviewer is valid, and one that we had considered when designing the study. There are several methods available for making epitope-specific TCR annotations and our choice was informed by several considerations:

    • Our data is primarily beta-chain TCR sequences. Despite the high performance of TCRdist on paired alpha-beta chain TCR data, it has been shown that these approaches do not outperform hamming distances on beta-chain only (Meysman et al., Bioinformatics, 2019). Not that the length-restriction may seem like a large restriction, but in practice length of the CDR3 sequence is known as a strong predictor for epitope preference (De Neuter et al., Immunogenetics, 2018; Meysman et al., Bioinformatics, 2019; Valkiers et al. Bioinformatics, 2021).

    • The TCR repertoire data and the epitope-specific TCR data were extracted using different kits (Adaptive vs QIAGEN) due to difference in starting samples (millions of cells vs thousands of cells) and used different processing pipelines. It is well established that these processing can induce a bias in the TCRs that are reported. Thus we opted for the simplest method as it was deemed to be most robust against any such bias. Methods such as TCRdist have been designed to translate findings from one samples derived from an experimental setup to another sample with the same setup.

    • The epitope-specific sequences are few in number. We wished to have as high a coverage of the HBs antigen as possible, so opted not to use more advanced methods which usually place a restriction on the minimum number of input TCRs that are required to build an annotation model. In addition, we wished to use an equivalent model for the bystander sequences (the denominator of the Rhbs metric), likely involve a set of TCRs targeting multiple epitopes. This invalidates many approaches which require the assumption of an epitope-specific data set.

    Thus we opted for the straight-forward hamming distance approach, which has been applied in several prior studies (notably the look-up functionality of VDJdb uses hamming distance).

    The origin of this choice was indeed obscure in the previous version of the paper, and has now been clarified.

    There is a distance cutoff parameter c required to define R_hbs. How was this parameter chosen? In particular, if it was tuned to produce the best AUROC, then the cross validation procedure is not legitimate (nested cross validation would be needed, or separate held out test set).

    The reviewer is correct. The distance cutoff c cannot be informed by tuning the AUROC as it would invalidate the cross validation procedure due to information bleed.

    The cutoff was set based on prior research done several years ago on several independent data sets (Meysman et al., Bioinformatics, 2019). This cutoff was kept to not bias the current results. Furthermore the functions used within the code were already highly optimized towards this cutoff (as it allowed a hashing dictionary to be constructed for fast look-up). As the reviewer rightfully points out, the prior version of the code base did not even allow c to be set and was already baked-into the search algorithm. This has been clarified in the revision.

  2. Evaluation Summary:

    By using modern high-throughput sequencing this paper demonstrates the antibody mediated immune responses that are elicited by vaccination are improved by pre-existing memory CD4 T cell responses. Moreover, the experimental data are an important contribution and may also be useful as a data resource for future research. All reviewers agree that the findings are of great interest. However, there are still some clarifications needed in statistical analytical and validations so they convincingly support the conclusions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

  3. Reviewer #1 (Public Review):

    George Elias et al investigated the response of a cohort of individuals to Hepatitis B vaccination and analysed the role of preexisting vaccine-reactive CD4+ memory T cell receptors in the immune response. They found that the presence of these cross-reactive receptors elicits a faster and stronger response in the vaccines. This is an extremely interesting result, as it suggests that a better understanding of the immune receptor repertoire of an individual can be used to predict and analyse its response to vaccination.

    Strengths:

    The study presents a detailed experimental analysis of the role of CD4+ T cells in the immune response to vaccination.

    The authors show clearly that the dynamics of expansion of memory CD4+ vaccine-specific clones follows the immune response, corroborating the results of previous studies that analysed effector CD4+ cells.

    The authors asked also whether the presence of preexisting vaccine-specific clones impacts the response to vaccination. They found that this is the case. They defined an estimator of a normalized number of putative vaccine-specific clones and showed that can be used to classify individuals into early or late responders. This result has the potential to be extremely impactful in the way we understand immune response to vaccination.

    Weaknesses:

    This central result follows the definition of the R_{hbs} measure. It is not completely clear how much the numerator and denominator of R_{hbs} contribute to the results and how those bystander and putative receptor sequences have been chosen. Some additional explanations could help reinforce the trust in this specific analysis.

    It is also not clear if multiple testing correction has been performed in the presentation of the results of Fig5.

    The correlation between the number of putative vaccine-reactive CD4+ T cells at day 60 and antibody titers is an interesting and robust result. This however does not support the claim of the authors that preexisting vaccine-specific CD4+ memory cells are associated with stronger immune response. This could be the case only if a similar correlation would be observed at day 0.

  4. Reviewer #2 (Public Review):

    By using modern high-throughput sequencing this paper demonstrates the antibody mediated immune responses that are elicited by vaccination are improved by pre-existing memory CD4 T cell responses. This is important, interesting and novel. These results can only be obtained by interdisciplinary collaborations between clinicians, immunologists and bioinformaticians.

  5. Reviewer #3 (Public Review):

    This manuscript presents a comprehensive study of CD4 memory T cell receptor beta repertoire response to hepatitis B vaccination, including repertoire correlates of early, late, and non seroconversion, identification of antigen specific and epitope specific clones, and a statistical classifier to potentially predict early Vs late seroconverters based on their pre-vaccination bulk repertoire. The major strengths are a unified experimental and computational analysis of bulk TCR repertoire data with antigen and epitope specific sorted T cells from the same individuals, allowing them to track personalized dynamics of vaccine specific clones, as well as translate across individuals to predict vaccine-induced seroconversion outcomes from pre-vaccination repertoires. The experimental data and reproducible analysis code are publicly accessible, and represent a useful resource that will likely be of interest beyond this study to other immune repertoire researchers.

    The results seem to support the authors conclusions, however several reported findings based on statistical analysis are less convincing, and would benefit from improved validation, clarification, or reworking. I next detail these aspects ordered by results sections.

    Section beginning line 128:

    The reported finding of this section is that early-converters (and not late-converters) undergo repertoire remodeling by day 60 post vaccination that decreases repertoire clonality. The evidence presented to support this is a computation of Shannon entropy for day 60 Vs day 0 in each individual, and a paired sample statistical test that is nominally significant for early and not for late converters. However, this nominally significant p value 0.042 is quite marginal, and the associated plots (Fig 2a) indicate only a very modest visual difference, and the presence of a distant outlier. The p value for late converters is not shown, however the marginally significant p value for early converters may not be nominally significant (at alpha 0.05) after multiple test correction (two tests). Additionally, the range of possible entropy values depends on the total sample size, so part of this difference may be driven by sample size. It may be more appropriate to use the Shannon equitability index (normalizing by the maximum possible entropy given the sample size, which is the log of the richness).

    Section beginning line 147:

    Ag specific T cells were isolated from day 60 samples and sequenced, allowing the authors to track the dynamics of these clones in the bulk repertoire data across time points. In all vaccinee groups these Ag specific clones are found to increase from day 0 to day 60 in the bulk repertoires. A marginal p value (0.04909) is presented to support early-converters showing more increase in these Ag specific clones. However, statistics comparing early to non or late to non converters are not mentioned (and these would require a multiple test correction on the p value that is discussed).

    A more general difficulty I have with this section is that the null hypothesis isn't made clear, and is probably more subtle and complicated than it appears. Cells are sorted from day 60, then their prevalence is compared between day 0 and day 60. Don't we expect to see more of them in day 60 even if there is no specific expansion for these clonotypes, but just random repertoire churn? My concern here is that double dipping from day 60 is affecting the analysis, since this time point is initially used to define the marker clones in the first place. If you take a random set of day 60 TCRs as null marker clones do you also see they are more prevalent in day 60 Vs day 0, or are you assuming that there should be no difference under the null?

    The last analysis in this section (presented in Fig 2d) does group-level comparisons of Ag specific clone fractions at day 60. I don't follow why normalizing by the number of Ag-specific clones detected in each individual is correct (i.e. would result in no differences under the null). Here again it could be helpful to see if null marker clone sets (the same size as the true Ag specific sets for each individual) indeed show no significant differences between groups.

    Section beginning line 185:

    In this section, a peptide pool approach is used to identify epitope specific TCRs from each individual at day 60, and a classifier is constructed to discriminate between early and late converter bulk repertoires, using a quantity R_hbs that measures the relative fraction of peptide specific TCRs in the repertoire according to Hamming distance similarity to the peptide specific TCRs. Importantly (as stated in the methods) a cross validation procedure is employed where TCRs from a given individual are not used for classification of that same individual. Since d is Hamming distance on CDR3 sequences, presumably comparisons are only made for TCRs with identical CDR3 length differences. This seems like a limitation, since clones with identical V and J gene, and CDR3 that differ by only one in CDR3 length could very well bind the same epitope. A more TCR-specific distance function, such as the TCRdist of Dash et al., may significantly increase classifier performance.

    There is a distance cutoff parameter c required to define R_hbs. How was this parameter chosen? In particular, if it was tuned to produce the best AUROC, then the cross validation procedure is not legitimate (nested cross validation would be needed, or separate held out test set).