Statistical inference reveals the role of length, GC content, and local sequence in V(D)J nucleotide trimming

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    Russel et al. study and reveal compelling evidence for potential sequence-based factors that may drive VDJ trimming, a mechanism involved in VDJ recombination that shapes adaptive immune repertoire generation. The work is based on a rigorous statistical comparison of logistic regression models to reveal the role and function of cutting enzymes in shaping T- and B-cell receptor diversity. It could provide fundamental new insights into these processes with some claims being currently incomplete.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

To appropriately defend against a wide array of pathogens, humans somatically generate highly diverse repertoires of B cell and T cell receptors (BCRs and TCRs) through a random process called V(D)J recombination. Receptor diversity is achieved during this process through both the combinatorial assembly of V(D)J-genes and the junctional deletion and insertion of nucleotides. While the Artemis protein is often regarded as the main nuclease involved in V(D)J recombination, the exact mechanism of nucleotide trimming is not understood. Using a previously published TCRβ repertoire sequencing data set, we have designed a flexible probabilistic model of nucleotide trimming that allows us to explore various mechanistically interpretable sequence-level features. We show that local sequence context, length, and GC nucleotide content in both directions of the wider sequence, together, can most accurately predict the trimming probabilities of a given V-gene sequence. Because GC nucleotide content is predictive of sequence-breathing, this model provides quantitative statistical evidence regarding the extent to which double-stranded DNA may need to be able to breathe for trimming to occur. We also see evidence of a sequence motif that appears to get preferentially trimmed, independent of GC-content-related effects. Further, we find that the inferred coefficients from this model provide accurate prediction for V- and J-gene sequences from other adaptive immune receptor loci. These results refine our understanding of how the Artemis nuclease may function to trim nucleotides during V(D)J recombination and provide another step toward understanding how V(D)J recombination generates diverse receptors and supports a powerful, unique immune response in healthy humans.

Article activity feed

  1. eLife assessment

    Russel et al. study and reveal compelling evidence for potential sequence-based factors that may drive VDJ trimming, a mechanism involved in VDJ recombination that shapes adaptive immune repertoire generation. The work is based on a rigorous statistical comparison of logistic regression models to reveal the role and function of cutting enzymes in shaping T- and B-cell receptor diversity. It could provide fundamental new insights into these processes with some claims being currently incomplete.

  2. Reviewer #1 (Public Review):

    A quantitative understanding of the mechanisms underlying VDJ recombination is a prerequisite for a better understanding of adaptive immune repertoire generation. Here, Russel et al. study potential sequence-based factors that may drive VDJ trimming, a mechanism involved in VDJ recombination. This work provides a significant advance in the statistical modeling of immune repertoire generation.

    Using a previously-published TCR𝛽 repertoire sequencing data set, the authors designed a probabilistic model of nucleotide trimming that allows the exploration of various mechanistically-interpretable sequence-level features. Using this model, they show that local sequence context and the capacity for sequence-breathing, together, can most accurately predict the trimming probabilities of a given V-gene sequence. Their model suggests that double-stranded DNA needs to be able to "breathe" for trimming to occur and provides evidence of a sequence motif that appears to get preferentially trimmed, independent of breathing. Importantly their findings are not dataset-dependent.

    So far, there exists no model for VDJ trimming, a major mechanism in the process of VDJ recombination. With this model, we are now in the position to refine modeling tools for VDJ recombination. Importantly, the model developed by Russel et al. enables exploration of what biological sequence-based factors most contribute to VDJ trimming. To support their conclusions, the authors test their approach on multiple model architectures and AIRR datasets.

    While I agree that this is important work, the authors might be overstating the mechanistic insight achieved given that solely statistical inference was used in this work. This is something that requires more discussion and support from the authors.

  3. Reviewer #2 (Public Review):

    In this work, the authors did a comprehensive model comparison to find the best predictor of where V genes are trimmed during the V(D)J recombination process, using their DNA sequence alone. This is an important step towards characterizing how the diversity of T-cell receptors and antibodies is generated and to better understanding the function of the enzymes involved in the process, such as Artemis.

    The authors find that the best model uses a combination of the sequence-specific position-weight matrix, and the GC content of DNA on both sides of the cutting site, which they relate to the DNA's ability to "breathe." Their conclusions are based on a rigorous comparison of log-likelihoods using independent test data from other loci than the one on which the models were trained. The study also includes myriad tests and controls, increasing confidence in their conclusions.