Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This study provides an important resource by thoroughly benchmarking multiple sequencing-based tRNA quantification methods. The suggested best practice is supported by convincing evidence from in silico experiments in multiple scenarios.

This article has been Reviewed by the following groups

Read the full article

Abstract

Quantification of transfer RNA (tRNA) using illumina sequencing based tRNA-Seq is complicated due to their degree of redundancy and extensive modifications. As such, no tRNA-Seq method has become well established, while various approaches have been proposed to quantify tRNAs from sequencing reads. Here, we use realistic tRNA-Seq simulations to benchmark tRNA-Seq quantification approaches, including two novel approaches. We demonstrate that these novel approaches are consistently the most accurate, using data simulated to mimic five different tRNA-Seq methods. This simulation-based benchmarking also identifies specific shortfalls for each quantification approach and suggests that up to 13% of the variance observed between cell lines in real tRNA-Seq data could be due to systematic differences in quantification accuracy.

Article activity feed

  1. eLife Assessment

    This study provides an important resource by thoroughly benchmarking multiple sequencing-based tRNA quantification methods. The suggested best practice is supported by convincing evidence from in silico experiments in multiple scenarios.

  2. Reviewer #1 (Public review):

    Summary:

    In the manuscript titled "Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy," Tom Smith and colleagues conducted a comparative evaluation of various sequencing-based tRNA quantification methods. The inherent challenges in accurately quantifying tRNA transcriptional levels, stemming from their short sequences (70-100nt), extensive redundancy (~600 copies in human genomes with numerous isoacceptors and isodecoders), and potential for over 100 post-transcriptional chemical modifications, necessitate sophisticated approaches. Several wet-experimental methods (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA) combined with bioinformatics tools (bowtie2-based, SHRiMP, and mimseq) have been proposed for this purpose. However, their practical strengths and weaknesses have not been comprehensively explored to date. In this study, the authors systematically assessed and compared these methods, considering factors such as incorrect alignments, multiple alignments, misincorporated bases (experimental errors), truncated reads, and correct assignments. Additionally, the authors introduced their own bioinformatic approaches (referred to as Decision and Salmon), which, while not without flaws (as perfection is unattainable), exhibit significant improvements over existing methods.

    Strengths:

    The manuscript meticulously compares tRNA quantification methods, offering a comprehensive exploration of each method's relative performance using standardized evaluation criteria. Recognizing the absence of "ground-truth" data, the authors generated in silico datasets mirroring common error profiles observed in real tRNA-seq data. Through the utilization of these datasets, the authors gained insights into prevalent sources of tRNA read misalignment and their implications for accurate quantification. Lastly, the authors proposed their own downstream analysis pipelines (Salmon and Decision), enhancing the manuscript's utility.

  3. Reviewer #2 (Public review):

    Summary:

    The authors provided benchmarking study results on tRNA-seq in terms of read alignment and quantification software with optimal parameterization. This result can be a useful guideline for choosing optimal parameters for tRNA-seq read alignment and quantification.

    Strengths:

    Benchmarking results for read alignment can be a useful guideline for choosing optimal parameters and mapping strategy (mapping to amino acid) for various tRNAseq.

    Weaknesses:

    Some explanation on sequencing data analysis pipeline is not clear for general readers.

  4. Author response:

    The following is the authors’ response to the original reviews.

    Reviewer 1:

    Because tRNA-sequencing methods have not been widely used (compared to mRNA-seq), many readers would not be familiar with the characteristics of different methods introduced in this study (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA; bowtie2-based, SHRiMP, and mimseq; what are the main features of "Salmon?"). The manuscript will read better when the basic features of these methods are described in the manuscript, however brief.

    Introduction page 4 now clarifies a little more the difference between bowtie2, SHRiMP and mimseq. Results page 9 briefly summarises the differences between the tRNA-Seq methods. Results page 14 clarifies how Decision and Salmon work.

    Reviewer 2:

    (1) The explanation of the parameter D for bowtie2 sounds ambiguous. "How much effort to expend" needs to be explained in more detail.

    Results page 6 gives a more precise explanation of the D parameter.

    (2) Please provide optimal parameters (L and D) for tRNA-seq alignment.

    I think optimal here is not possible to determine. It will depend on the species, the frequency of misincorporations due to modifications (tRNA-Seq protocol specific) and how long one is willing to let bowtie continue searching for a better match. The point of Figure 1a is that D needs to be increased if L is decreased and an error is allowed in the seed. I think the sentence in the results section Figure 1a is the appropriate way to express this without committing to a single ‘optimal’ parameterisation_:_ ‘We observed that when an error in the seed is allowed, as the seed length is decreased, there needs to be a concomitant increase in effort expended to allow bowtie2 more opportunities to find the best possible alignment, especially with respect to the Transcript ID‘.

    (3) I think the authors chose L=10 and D=100 based on Figure 1A. Which dataset did you choose for this parameterization among ALL-tRNAseq, DM-tRNAseq, mim-tRNAseq, QuantM-tRNA-seq, and YAMAT-seq?

    Figure 1A is based on simulation of full length reads with only sequencing errors, e.g not from any tRNA-Seq method in particular. This is stated in the results text and I’ve clarified in the figure legend.

    (4) Salmon does not need a read alignment process such as Bowtie2. Hence, it is not clear "Only results from alignment with bowtie2" in Figure legend for Figure 4a.

    I’m using Salmon in ‘alignment-mode’, taking the alignments from bowtie2. I’ve clarified this in results page 14.

  5. Author response:

    We thank the reviewers for their critical appraisal of our manuscript. We will address the points of confusion and/or lack of clarity in a revised manuscript. We agree with reviewer 1 that applying the best practice pipeline(s) on new experimental data and comparing this approach with current practices would be a useful demonstration of how this alters the biological interpretation. This is something we are in the process of completing but believe this is best addressed in a separate manuscript where we can focus on the associated biological findings, allowing this manuscript to remain focused on the accurate quantification of tRNA-Seq data.

  6. eLife assessment

    This study provides a valuable resource by thoroughly benchmarking multiple sequencing-based tRNA quantification methods. The suggested best practice is supported by solid evidence from in silico experiments in multiple scenarios. The major weakness of the manuscript is the incomplete validation of newly generated experimental datasets.

  7. Reviewer #1 (Public Review):

    Summary:

    In the manuscript titled "Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy," Tom Smith and colleagues conducted a comparative evaluation of various sequencing-based tRNA quantification methods. The inherent challenges in accurately quantifying tRNA transcriptional levels, stemming from their short sequences (70-100nt), extensive redundancy (~600 copies in human genomes with numerous isoacceptors and isodecoders), and potential for over 100 post-transcriptional chemical modifications, necessitate sophisticated approaches. Several wet-experimental methods (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA) combined with bioinformatics tools (bowtie2-based, SHRiMP, and mimseq) have been proposed for this purpose. However, their practical strengths and weaknesses have not been comprehensively explored to date. In this study, the authors systematically assessed and compared these methods, considering factors such as incorrect alignments, multiple alignments, misincorporated bases (experimental errors), truncated reads, and correct assignments. Additionally, the authors introduced their own bioinformatic approaches (referred to as Decision and Salmon), which, while not without flaws (as perfection is unattainable), exhibit significant improvements over existing methods.

    Strengths:

    The manuscript meticulously compares tRNA quantification methods, offering a comprehensive exploration of each method's relative performance using standardized evaluation criteria. Recognizing the absence of "ground-truth" data, the authors generated in silico datasets mirroring common error profiles observed in real tRNA-seq data. Through the utilization of these datasets, the authors gained insights into prevalent sources of tRNA read misalignment and their implications for accurate quantification. Lastly, the authors proposed their downstream analysis pipelines (Salmon and Decision), enhancing the manuscript's utility.

    Weaknesses:

    As discussed in the manuscript, the error profiles derived from real-world tRNA-seq datasets may still harbor biases, as reads that failed to "align" in the analysis pipelines were not considered. Additionally, the authors did not validate the efficacy of their "best practice" pipelines on new real-world datasets, preferably those generated by the authors themselves. Such validation would not only confirm the improvements but also demonstrate how these pipelines could alter biological interpretations.
    Because tRNA-sequencing methods have not been widely used (compared to mRNA-seq), many readers would not be familiar with the characteristics of different methods introduced in this study (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA; bowtie2-based, SHRiMP, and mimseq; what are the main features of "Salmon?"). The manuscript will read better when the basic features of these methods are described in the manuscript, however brief.

  8. Reviewer #2 (Public Review):

    Summary:

    The authors provided benchmarking study results on tRNA-seq in terms of read alignment and quantification software with optimal parameterization. This result can be a useful guideline for choosing optimal parameters for tRNA-seq read alignment and quantification.

    Strengths:

    Benchmarking results for read alignment can be a useful guideline to choose optimal parameters and mapping strategy (mapping to amino acid) for various tRNAseq.

    Weaknesses:

    The topic is highly specific, and the novelty of the analysis might not be widely useful for general readers.

    Some details of the sequencing data analysis pipeline are not clear for general readers:

    (1) The explanation of the parameter D for bowtie2 sounds ambiguous. "How much effort to expend" needs to be explained in more detail.

    (2) Please provide optimal parameters (L and D) for tRNA-seq alignment.

    (3) I think the authors chose L=10 and D=100 based on Figure 1A. Which dataset did you choose for this parameterization among ALL-tRNAseq, DM-tRNAseq, mim-tRNAseq, QuantM-tRNA-seq, and YAMAT-seq?

    (4) Salmon does not need a read alignment process such as Bowtie2. Hence, it is not clear "Only results from alignment with bowtie2" in Figure legend for Figure 4a.