The use of non-functional clonotypes as a natural calibrator for quantitative bias correction in adaptive immune receptor repertoire profiling

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper describes a newly developed, publicly available algorithm (iROAR) that was tested on pre-exisiting datasets and is of interest to T and B cell immunologists who perform repertoire analysis via multiplex PCR based techniques. iROAR utilises naturally occurring non-functional sequences to improve and partially correct the amplification bias inherent in multiplex PCR based sequencing technologies.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

High-throughput sequencing of adaptive immune receptor repertoires is a valuable tool for receiving insights in adaptive immunity studies. Several powerful TCR/BCR repertoire reconstruction and analysis methods have been developed in the past decade. However, detecting and correcting the discrepancy between real and experimentally observed lymphocyte clone frequencies are still challenging. Here, we discovered a hallmark anomaly in the ratio between read count and clone count-based frequencies of non-functional clonotypes in multiplex PCR-based immune repertoires. Calculating this anomaly, we formulated a quantitative measure of V- and J-genes frequency bias driven by multiplex PCR during library preparation called Over Amplification Rate (OAR). Based on the OAR concept, we developed an original software for multiplex PCR-specific bias evaluation and correction named iROAR: immune Repertoire Over Amplification Removal ( https://github.com/smiranast/iROAR ). The iROAR algorithm was successfully tested on previously published TCR repertoires obtained using both 5’ RACE (Rapid Amplification of cDNA Ends)-based and multiplex PCR-based approaches and compared with a biological spike-in-based method for PCR bias evaluation. The developed approach can increase the accuracy and consistency of repertoires reconstructed by different methods making them more applicable for comparative analysis.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    Alexander Komkov et al. developed a novel software/algorithm (iROAR) to utilise naturally occurring non-functional clonotypes as a control repertoire to correct for amplification bias associated with multiplex PCR based technologies commonly used in TCR/BCR repertoire analysis. No new data was generated in this study and utilises only publicly available datasets. The authors firstly determine the over amplification rate (OAR) as a metric which is found to be close to 1 under no or little amplification bias and this was validated by calculating the OAR for repertoires determined using 5'-RACE, a method known to have little to no amplification bias. This was a great control to have and is essential for validating the OAR measurement. In contrast, multiplex PCR based protocols such as VMPlex and VJMplex had significant deviations in the distribution of OAR.

    Strengths: The authors used publicly available datasets that utilise both biased (multiplex PCR based) and low biased (5'-RACE) methods to determine TCR/BCR repertoires. In addition, the authors generated in silico biased 5'-RACE datasets. These comparisons are critical in determining the effect of bias correction.

    Weaknesses: Analysis of TCR/BCR repertoires are very generalised to number of clonotypes. The use of this algorithm could be more widespread if the effect of iROAR on another repertoire analysis tools was determined or discussed. For example, does iROAR affect measures of diversity? Identification of rare but unique clonotypes? The ability to detect true clonal expansions? Additionally, documentation for the software is lacking and largely inaccessible to non-specialists.

    By default, iROAR does not affect diversity and does not remove any clones. This statement was added to the manuscript. For now, the analysis of the potential effect on the detection of true clonal expansion is infeasible due to the lack of appropriate data with sufficient sequencing coverage. Also, we’ve made a more detailed description of iROAR software.

    Reviewer #2 (Public Review):

    In this paper, Komkov et al. describe a novel approach for computational correction of PCR amplification bias in adaptive immune receptor repertoire (AIRR) sequencing data (AIRR-seq). Their correction algorithm is based on using out-of-frame rearrangements to approximate gene-specific amplification bias. Gene-specific relative frequencies among out-of-frame rearrangements are not altered by clonal expansion except to the extent that out-of-frame rearrangements are passengers in clones expanding as a consequence of the specificity of the functional rearrangement. Due to independence between the two rearrangements, it can be reasonably assumed that the effects of clonal expansion are uniform in their impact on the observed V- and J-gene frequencies among out-of-frame rearrangements. Komkov et al. further assume that gene-specific relative frequencies among unique, out-of-frame rearrangements approximate recombination frequencies and that the extent to which gene-specific relative frequencies among all out-of-frame rearrangements deviate from those among unique, out-of-from rearrangements provides an estimate of gene-specific PCR amplification bias. The ratio of V- or J-gene relative frequencies among all out-of-frame rearrangements to the corresponding relative frequency among unique out-of-frame rearrangements provides this estimate and can be used as a correction factor during data processing. It also serves as the basis for a repertoire-level metric of the overall extent of amplification bias in a repertoire.

    This is a very nice and, to the best of my knowledge, novel idea. The proposed correction factor and metric have potential utility in all studies conducting AIRR-seq that use a PCR amplification step. While the proposed approach may not have superior or even equal performance when compared to biological spike-ins, it still has great potential utility given the time and financial costs and required expertise of using biological spike-ins and because it can be applied to data sets that have already been generated. Incorporation of this approach into AIRR-seq data processing has the potential to increase the accuracy of downstream analyses. It also has the potential to enhance the comparability of results across studies and to reduce the effects of different sequencing protocols for data re-use when data are integrated across studies.

    Enthusiasm is dampened by the fact that the proposed method is not directly compared to the gold standard of biological spike-ins.

    During manuscript revision, we designed and performed an additional wet-lab experiment to directly compare the iROAR approach with biological spike-ins.

  2. Evaluation Summary:

    This paper describes a newly developed, publicly available algorithm (iROAR) that was tested on pre-exisiting datasets and is of interest to T and B cell immunologists who perform repertoire analysis via multiplex PCR based techniques. iROAR utilises naturally occurring non-functional sequences to improve and partially correct the amplification bias inherent in multiplex PCR based sequencing technologies.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    Alexander Komkov et al. developed a novel software/algorithm (iROAR) to utilise naturally occurring non-functional clonotypes as a control repertoire to correct for amplification bias associated with multiplex PCR based technologies commonly used in TCR/BCR repertoire analysis. No new data was generated in this study and utilises only publicly available datasets. The authors firstly determine the over amplification rate (OAR) as a metric which is found to be close to 1 under no or little amplification bias and this was validated by calculating the OAR for repertoires determined using 5'-RACE, a method known to have little to no amplification bias. This was a great control to have and is essential for validating the OAR measurement. In contrast, multiplex PCR based protocols such as VMPlex and VJMplex had significant deviations in the distribution of OAR.

    Strengths: The authors used publicly available datasets that utilise both biased (multiplex PCR based) and low biased (5'-RACE) methods to determine TCR/BCR repertoires. In addition, the authors generated in silico biased 5'-RACE datasets. These comparisons are critical in determining the effect of bias correction.

    Weaknesses: Analysis of TCR/BCR repertoires are very generalised to number of clonotypes. The use of this algorithm could be more widespread if the effect of iROAR on another repertoire analysis tools was determined or discussed. For example, does iROAR affect measures of diversity? Identification of rare but unique clonotypes? The ability to detect true clonal expansions? Additionally, documentation for the software is lacking and largely inaccessible to non-specialists.

  4. Reviewer #2 (Public Review):

    In this paper, Komkov et al. describe a novel approach for computational correction of PCR amplification bias in adaptive immune receptor repertoire (AIRR) sequencing data (AIRR-seq). Their correction algorithm is based on using out-of-frame rearrangements to approximate gene-specific amplification bias. Gene-specific relative frequencies among out-of-frame rearrangements are not altered by clonal expansion except to the extent that out-of-frame rearrangements are passengers in clones expanding as a consequence of the specificity of the functional rearrangement. Due to independence between the two rearrangements, it can be reasonably assumed that the effects of clonal expansion are uniform in their impact on the observed V- and J-gene frequencies among out-of-frame rearrangements. Komkov et al. further assume that gene-specific relative frequencies among unique, out-of-frame rearrangements approximate recombination frequencies and that the extent to which gene-specific relative frequencies among all out-of-frame rearrangements deviate from those among unique, out-of-from rearrangements provides an estimate of gene-specific PCR amplification bias. The ratio of V- or J-gene relative frequencies among all out-of-frame rearrangements to the corresponding relative frequency among unique out-of-frame rearrangements provides this estimate and can be used as a correction factor during data processing. It also serves as the basis for a repertoire-level metric of the overall extent of amplification bias in a repertoire.

    This is a very nice and, to the best of my knowledge, novel idea. The proposed correction factor and metric have potential utility in all studies conducting AIRR-seq that use a PCR amplification step. While the proposed approach may not have superior or even equal performance when compared to biological spike-ins, it still has great potential utility given the time and financial costs and required expertise of using biological spike-ins and because it can be applied to data sets that have already been generated. Incorporation of this approach into AIRR-seq data processing has the potential to increase the accuracy of downstream analyses. It also has the potential to enhance the comparability of results across studies and to reduce the effects of different sequencing protocols for data re-use when data are integrated across studies.

    Enthusiasm is dampened by the fact that the proposed method is not directly compared to the gold standard of biological spike-ins.