Using synthetic RNA to benchmark poly(A) length inference from direct RNA sequencing
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (GigaScience)
Abstract
Polyadenylation is a dynamic process that is important in cellular physiology, which has implications in messenger RNA decay rates, translation efficiency, and isoform-specific regulation. Oxford Nanopore Technologies direct RNA sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome. Several tools are currently available for poly(A) tail length estimation, including well-established methods like tailfindr and nanopolish, as well as more recent deep learning models like Dorado. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this article, we present our novel deep learning poly(A) estimation tool—BoostNano—and compare with 3 existing tools—tailfindr, nanopolish, and Dorado. We evaluate the 4 poly(A) estimation tools, using 2 sets of synthetic in vitro transcribed RNA standards with known poly(A) tail lengths—Sequin (30 or 60 nucleotides) and enhanced green fluorescent protein (10–150 nucleotides) RNA. Analyzing datasets with known ground-truth values is a valuable approach to measuring the accuracy of poly(A) length estimation. The tools demonstrated length- and sample-dependent performance, and accuracy was enhanced by averaging over multiple reads via estimation of the peak of the density distribution. Overall, Dorado is recommended as the preferred approach due to its relatively fast runtimes, low mean error, and ease of use with integration with base-calling. These results provide a reference for poly(A) tail length estimation analysis, aiding in improving our understanding of the transcriptome and the relationship between poly(A) tail length and other transcriptional mechanisms, including transcript stability or quantification.
Article activity feed
-
AbstractPolyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean …
AbstractPolyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf098), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 2: Jesse Daniel Brown
This manuscript addresses a relevant and timely question: benchmarking poly(A) tail-length estimation tools (BoostNano, tailfindr, nanopolish, and Dorado) using synthetic RNA standards (Sequins) with known tail lengths. Poly(A) tail-length estimation is increasingly important for understanding mRNA stability, processing, and regulation at the single-molecule level. As direct RNA sequencing expands in use, reliable methods to measure poly(A) tail lengths are needed. The study's desiign—leveraging Sequins as a "gold standard" to benchmark tools—is strong and fills an area is need in current literature. The analysis is thorough in its basic comparisons, and the results are likely to be useful to researchers who need to choose suitable software for poly(A) tail analysis. However, the manuscript would benefit from deeper contextualization, more rigorous statistical methodology, and clearer reporting of computational details. Ensuring reproducibility and providing clearer guidance on interpreting the results in real biological contexts would strengthen the mannuscript. The suggestions below are aimed at making the study more valuable to the community. For this reason, my recommendation is Revisions ARE Needed
Introduction
Abstract: ★★★★☆ (4/5) Actually in place of the introduction, it has it strengths: The introduction adequately outlines why polyadenylation is biologically important and why direct RNA sequencing provides a unique opportunity for poly(A) tail-length estimation. It justifies the use of Sequins as synthetic standards, which is a robust approach to derive ground-truth tail lengths.
Areas for Improvement:The introduction could better connect poly(A) tail-length estimation to downstream applications. For instance, mention how accurate tail-length estimation could improve understanding of mRNA decay rates, translation efficiency, or isoform-specific regulation.
Adding references that contextualize poly(A) tail dynamics in broader biological phenomena would help readers understand the significance. For example, it is almost a necessity to cite work such as "Roles of mRNA poly(A) tails in regulation of eukaryotic gene expression" by Lori A. Passmore & Jeff Coller (2022, Nature Reviews Molecular Cell Biology) which provides a comprehensive analysis of poly(A) tail dynamics and their impact on mRNA decay, stability, and translation regulation. P & C (2022) also expands on these principles by discussing the mechanistic underpinnings of poly(A)-mediated decay and translation regulation, making it a broader and more recent contribution to polyadenylation biology, which the authors should consider.
Grammar of the abstract: Error: "There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano." Suggestion: "Several tools are currently available for poly(A) tail-length estimation, including well-established methods like tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano."
Error: "which lie within 12% of the correct value." Suggestion: "that lie within 12% of the correct value."
Clarify the library preparation steps to avoid confusion about the "direct" nature of RNA sequencing. The text currently implies that no reverse transcription is required, but then references an ONT Reverse Transcription Adapter. Distinguish between a full-length cDNA synthesis step (not required) and the use of a poly(T)-containing adapter for sequencing library preparation.
Methods
Methods: ★★★★☆ (4/5) The methods section has its strengths; the data sources and preparation (Sequins spiked into host RNA) are clearly described. Versions of tools are provided, enhancing reproducibility.
Areas for Improvement are statistical analysis, comparisons and tests, hardware and computation details, and understanding of run time differences. Currently, the study models distributions as normal and uses mean and SD, but no normality tests or justification for these choices are presented. Consider performing normality tests or using nonparametric measures. Additionally, providing confidence intervals or other robust statistics (median, interquartile ranges) would clarify variability.
For the comparisons and tests, the authors should explain why you chose root mean square error (RMSE) minimization and other metrics. Could alternative tests, like Wilcoxon signed-rank tests or paired t-tests (Wilcocoxon: this non-parametric test is suitable for paired comparisons when the assumption of normality is not met. -useful to compare the predicted tail lengths from each tool against the expected lengths, especially if the data distribution is skewed.), be used to compare the distribution of tail-length estimates more rigorously? Paired t-Test, because this test could be applied if the normality assumption holds, providing a straightforward way to assess whether the mean difference between predicted and expected values is statistically significant. (If so, justification should be provided for why or why not)
There are some additional metrics to explore: ---Median Absolute Deviation (MAD): Consider adding MAD as it is robust to outliers and could complement RMSE to provide a better understanding of central tendencies and variability. ---Mean Absolute Error (MAE): MAE is another alternative that simplifies the interpretation by focusing solely on the magnitude of errors without squaring them, potentially offering more intuitive insights for readers. The authors should address testing for normality, explicitly stating whether normality tests were conducted on the data (e.g., Shapiro-Wilk or Kolmogorov-Smirnov tests). If normality is confirmed, justify the use of parametric tests like RMSE or t-tests. If not, justify why non-parametric tests (e.g., Wilcoxon) were not employed or discuss plans to include them in future studies.
Explain the choice of statistical methods over time by discussing how the choice of statistical tests aligns with the study's goals. For example, emphasize whether the focus was on understanding overall error distribution, tool consistency, or accuracy in predicting specific tail lengths.
The authors could use visual representations of error complementing the statistical tests with visual aids such as boxplots, violin plots, or Bland-Altman plots to illustrate the error distributions and discrepancies between predicted and actual tail lengths across tools.
The authors should provide hardware and computational details like providing explicit details on the computational environment—CPU/GPU models, RAM, OS—for each tool's run. While the Git-hub read me suggests how to run the system, it lacks any details about system requirements. Readers need this to understand runtime differences and attempt to replicate performance measurements.
The authors should consider tool parameterization and indicate if any specific parameters (beyond defaults) were used in tailfindr, nanopolish, Dorado, or BoostNano runs. If no changes were made from defaults, state this explicitly.
Results
The result's strengths are that they are presented clearly, showing density distributions and discussing short-tail anomalies. The identification of Dorado as a preferred tool due to speed, integration, and conservative filtering is well-supported by the data. The study acknowledges that all tools achieve broadly similar accuracy, differing mainly in runtime and filtering criteria, which is a practical insight for users.
The results have areas for improvement: Regrading the short-tail reads explanation, the authors attribute short (<10 nt) poly(A) tails to truncated transcripts or mis-priming. For this reason, it is suggested that the authors strengthen this discussion with additional evidence or reasoning. For instance, is there a correlation between read quality and short-tail length estimates? Do truncated reads consistently align to internal A-rich stretches? Multiple peaks in distributions: Some density plots (Figure 1) show multiple peaks or shoulder peaks. Discuss potential reasons for these patterns. Are they related to tool-specific biases, read quality, or adapter/poly(T) truncation? Application Context: The results focus on method performance, but it would help readers to understand how these differences might influence downstream tasks. For example, if a method overestimates poly(A) length slightly, how could this affect conclusions about RNA stability or differential tail-length analysis between experimental conditions? Figures and tables: Figure 1: Clear density plots, but consider adding vertical lines at expected tail lengths (30 nt and 60 nt) to guide interpretation. Splitting the figure into separate panels for R1 and R2 or using insets might clarify multiple peaks. Figure 2: The IGV snapshots are informative. Enhance interpretability by adding annotations (arrows or boxes) highlighting truncated vs. full-length reads. Increase font sizes for readability. Figure 3: Useful comparison of reads filtered by Dorado but retained by BoostNano. Add a brief note or labeling to indicate expected tail lengths. Discuss possible reasons for Dorado's conservative filtering here or in the main text. Tables: Provide definitions for abbreviations (nt, CPU, GPU) in captions. For Table 2, adding confidence intervals around the mean tail-length estimates would strengthen statistical rigor. For Table 3, specify hardware details as recommended above.
Grammar Mistakes and errors in the results section: Results Section: Sentence: "The four methods display a similar pattern in the density distribution, with a prominent normal-like peak near the expected poly(A) length, but also with a over-representation of shorter poly(A) tails, ranging at approximately ~0-10 nt (Figure 1)." Issue: "a over-representation" Correction: "an over-representation"
Sentence: "We expected that these shorter peaks were derived from either fragmentation of the transcript, mis-priming of internal poly(A) stretches or degradation of the poly(A) tails." Issue: tense mismatch ("expected" vs. "were derived"). Correction: "We expect" -- "were derived", loses context and tense contformity-- therefore the sentence should be adjusted- "We hypothesize that these shorter peaks are derived from either fragmentation of the transcript, mis-priming of internal poly(A) stretches, or degradation of the poly(A) tails."
Sentence: "Interestingly, upon investigating these earlier peaks, we found that Dorado excludes reads which are retained in the analysis by BoostNano, despite them being classified as passed reads (Figure 3)." Issue: Ambiguous pronoun "them." (them could incorrectly identify three possible targets in the sentence) Correction: "Interestingly, upon investigating these earlier peaks, we found that Dorado excludes reads retained in the analysis by BoostNano, even though these reads are classified as passed reads (Figure 3)."
Sentence: "Therefore, Dorado appears to be a more conservative approach than BoostNano." Issue: No grammar issues, but the statement could be more precise. Suggested improvement: "Thus, Dorado demonstrates a more conservative approach compared to BoostNano."
Sentence: "In order to determine which normal distribution fit the peak best, we found the parameters (mean, SD) which minimize the root mean square error between the candidate normal distribution and the density distribution for an interval of 10 nt to the right of the mode." Issue: Verb tense consistency ("fit"). Correction: "To determine which normal distribution fits the peak best, ..."
Sentence: "The peaks also lose their normal-like behavior for larger values." Issue: Could use a more formal tone. Correction: "The peaks also deviate from their normal-like behavior at larger values."
Sentence: "Next, we compared the computational time required by each method to predict the tail-length of 4000 reads." Issue: Hyphenation of "tail-length." Correction: "Next, we compared the computational time required by each method to predict the tail length of 4,000 reads."
Sentence: "BoostNano also offers the option of using the Application Programming Interface (API) call instead of the direct method, which omits the file copy implemented in the direct approach, reducing the run time to 8 m 8 s." Here, the sentence is extremely overwritten which cuases a lack of clarity. Correction: "BoostNano offers an alternative API-based method, which skips the file copy step of the direct approach, reducing the runtime to 8 minutes and 8 seconds."
Discussion
Discussion: ★★★☆☆ (3/5) The discussion as its strengths as it correctly identifies that Dorado's advantages (speed, integration with basecalling) make it appealing as a default choice. The authors acknowledge that all tools are within a similar accuracy range, suggesting the deciding factor may be speed or integration rather than raw performance differences. HOWEVER- there are areas for improvement: Further dissect the limitations of each tool. For example, BoostNano shows good SD but slightly off mean for R1; what does this mean for its use cases? Address the discrepancy between tailfindr, nanopolish, and Dorado in terms of how they define and detect poly(A) boundaries. Why does Dorado not evaluate start/end positions of poly(A) tails in event space, and how might this influence results? Include a brief discussion about how results might generalize to more complex transcriptomes. Real samples have varying GC content, fragment lengths, and potentially modified bases. A short commentary acknowledging these factors would show awareness that synthetic standards cannot capture the full complexity of natural RNA opulations. For these reasons, it is suggested that the authors suggest future directions. For instance, how could tool developers incorporate these findings to improve their methods? Could future benchmarking sets include a gradient of tail lengths to better understand length-specific biases?
Grammar Mistakes and errors in the discussion section: Sentence: "BoostNano and tailfindr tools provided estimation of the starting and ending positions of the poly(A) tails in event space while this information was absent in Dorado outputs." Issue: "provided estimation" should be "provide estimation" to align with present tense. Correction: "BoostNano and tailfindr tools provide estimation of the starting and ending positions of the poly(A) tails in event space, while this information is absent in Dorado outputs."
Sentence: "On the R1 dataset, BoostNano showed a tighter distribution with the smallest SD, but its peak was the furthest from the correct value." The issue here is that the test results are still speaking about genneral truths leading to verb tense inconsistency; "showed" should match other verbs in the section. Correction: "On the R1 dataset, BoostNano shows a tighter distribution with the smallest SD, but its peak is the furthest from the correct value."
Sentence: "tailfindr had the most accurate estimation but also the largest error interval."
The issue here is the verb tense mismatch; "had" should be consistent with present tense to show truth, not past truth. Correction: "tailfindr has the most accurate estimation but also the largest error interval."
Sentence: "Furthermore, Boostnano is more lenient in keeping reads for poly(A) estimation than Dorado."
Issue: "Boostnano" capitalization error; it should be "BoostNano." Correction: "Furthermore, BoostNano is more lenient in keeping reads for poly(A) estimation than Dorado."
Sentence: "Overall, our results suggest that the four tools investigated in this study - BoostNano, tailfindr, nanopolish and Dorado have similar performance with their accuracy varying from one dataset to the other, with a potential length bias."
Issue: Missing commas for clarity; replace "with their accuracy varying from one dataset to the other" for conciseness. Correction: "Overall, our results suggest that the four tools investigated in this study—BoostNano, tailfindr, nanopolish, and Dorado—have similar performance, with accuracy varying across datasets and showing potential length bias."
Sentence: "Therefore, we expect Dorado to be implemented as the default method of poly(A) tail estimation in the near future, with the rapid estimation timeframe, comparable estimation lengths to other tools, conservative nature and the added benefit of ease of obtaining this information during basecalling."
There are several issues here including verbosity and lack of parallelism. Correction: "Therefore, we expect Dorado to be implemented as the default method for poly(A) tail estimation, given its rapid estimation timeframe, comparable accuracy to other tools, conservative nature, and ease of integration with basecalling."
Sentence: "This work demonstrates the value of having access to synthetic RNA molecules with known poly(A) tail-lengths for validating the accuracy of poly(A) tail estimation algorithms."
Issue: The phrase "validating the accuracy of" could be simplified for readability. Correction: "This work demonstrates the value of synthetic RNA molecules with known poly(A) tail lengths for validating poly(A) tail estimation algorithms."
Sentence: "As methods improve, we anticipate that these datasets will be valuable for assessing improvements in estimation of poly(A) tails."
Issue: "improvements in estimation of" is awkward. Correction: "As methods improve, we anticipate that these datasets will be valuable for assessing advancements in poly(A) tail estimation."
References need to be added to accomodate the suggested material review, but existing references are good-
NEEDS REVISION Jesse Daniel Brown PD AASU
Note:
I previously reviewed this paper previously in Research Hub and you can read these comments via the Research Hub review page here: https://www.researchhub.com/paper/8634403/using-synthetic-rna-to-benchmark-polya-length-inference-from-direct-rna-sequencing/reviews#threadId=55398.
The original preprint linked to the Research Hub review is here: https://doi.org/10.1101/2024.10.25.620206
-
AbstractPolyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean …
AbstractPolyadenylation is a dynamic process which is important in cellular physiology. Oxford Nanopore Technologies direct RNA-sequencing provides a strategy for sequencing the full-length RNA molecule and analysis of the transcriptome and epi-transcriptome. There are currently several tools available for poly(A) tail-length estimation, including well-established tools such as tailfindr and nanopolish, as well as two more recent deep learning models: Dorado and BoostNano. However, there has been limited benchmarking of the accuracy of these tools against gold-standard datasets. In this paper we evaluate four poly(A) estimation tools using synthetic RNA standards (Sequins), which have known poly(A) tail-lengths and provide a valuable approach to measuring the accuracy of poly(A) tail-length estimation. All four tools generate mean tail-length estimates which lie within 12% of the correct value. Overall, Dorado is recommended as the preferred approach due to its relatively fast run times, low coefficient of variation and ease of use with integration with base-calling.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf098), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 1: Christoph Dieterich
In this manuscript, the authors present a benchmark to assess the performance of different tools designed for estimation of polyA tail length from Nanopore direct RNA-sequencing data. These tools include tailfindr, nanopolish, Dorado and Boost Nano. Benchmarks on tools and algorithms to analyze Nanopore data, both third party tools and official ONT releases, are of utmost importance for the field. The use of synthetic constructs with known ground truth is recommended as well. Consequently, this study has the potential to provide a significant contribution to the field.
In the current form, I can however not recommend it for publication in GigaScience. My major concerns are: a) Use of only RNA002 data. This chemistry is outdated and thus the Benchmark is only relevant for old, possibly already published data. A comprehensive Benchmark should also include RNA004 and available tools there (at least Dorado). b) The current data set only contains two polyA tail length, which are relatively short and do not cover longer polyA tails that are common e.g. in mammalian cells. A proper Benchmark should show the performance of the analyzed tools over a range of polyA tail lengths.
Minor comments:
- Abstract: "All four tools generate mean tail-length estimates which lie within 13% of the correct value." The value of 13% is given in the Abstract from the submission system, wherease the abstract in the Main text says 12%. Which value is correct?
- Background, first paragraph: the role of the polyA tail in RNA circularization, which is required for efficient translation of cellular mRNAs is not mentioned. Reference is missing for "is increasingly recognised as a dynamic process which influences timing and degree of protein production."
- Background, second paragraph: Chiron seems to be a relatively old basecaller (no models for new chemistries). It should be mentioned here that it is required for BoostNano.
- Mis-priming of internal polyA sites may an important confounding (and currently overlooked) source of errors in Nanopore sequencing. This should be quantified properly and analyzed in more detail (length of these stretches, influence of other nucleotides within the A-rich stretch, etc.). Should be done as well on whole transcriptome data with more possible mispriming sites.
- Why do the authors think that the poly(T) stretch of the RTA might be truncated? This is composed of DNA oligos, which should be quite stable
- What are the parameters for filtering used by Dorado and BoostNano? Can the authors explain, why the filtered reads differ?
- Dorado seems to systematically underestimate polyA tail length. Is this true also for data generated with RNA004 chemistry and longer polyA tails?
-
-
