Comparative analysis of eccDNA and circRNA tools shows increased accuracy of tool combination
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (GigaScience)
Abstract
Introduction
Circular nucleic acids such as extrachromosomal circular DNA (eccDNA) and circular RNA (circRNA) are increasingly recognized for their biological relevance and potential as biomarkers in disease contexts. Despite their growing importance, their detection remains challenging due to tool-specific biases, limited validation frameworks, and high variability in performance across datasets.
Methods
We benchmarked 10 circle detection tools across diverse conditions using both simulated and biological datasets. Our evaluation included classical performance metrics and a novel internal measure of read distribution symmetry ($\Delta$CJ) to assess circle prediction confidence. We explored the impact of sequencing protocols, filtering strategies, and combined tool consensus.
Results
We found that detection accuracy was highly influenced by sequencing depth, alignment algorithm, and experimental enrichment protocols. $\Delta$CJ proved effective in flagging potential false positive circles, showing improved accuracy of Intersect (circles detected by all tools) and Rosette (circles detected by $\ge$2 tools) combinations.
Discussion
This study offers a broad evaluation of circular detection tools, suggesting that the combination of $\ge$3 tools is necessary for a correct prediction. These insights will inform future experimental design and data analysis pipelines in both experimental and clinical settings.
Article activity feed
-
AbstractIntroduction Circular nucleic acids such as extrachromosomal circular DNA (eccDNA) and circular RNA (circRNA) are increasingly recognized for their biological relevance and potential as biomarkers in disease contexts. Despite their growing importance, their detection remains challenging due to tool-specific biases, limited validation frameworks, and high variability in performance across datasets.Methods We benchmarked 10 circle detection tools across diverse conditions using both simulated and biological datasets. Our evaluation included classical performance metrics and a novel internal measure of read distribution symmetry (ΔCJ) to assess circle prediction confidence. We explored the impact of sequencing protocols, filtering strategies, and combined tool consensus.Results We found that detection accuracy was highly …
AbstractIntroduction Circular nucleic acids such as extrachromosomal circular DNA (eccDNA) and circular RNA (circRNA) are increasingly recognized for their biological relevance and potential as biomarkers in disease contexts. Despite their growing importance, their detection remains challenging due to tool-specific biases, limited validation frameworks, and high variability in performance across datasets.Methods We benchmarked 10 circle detection tools across diverse conditions using both simulated and biological datasets. Our evaluation included classical performance metrics and a novel internal measure of read distribution symmetry (ΔCJ) to assess circle prediction confidence. We explored the impact of sequencing protocols, filtering strategies, and combined tool consensus.Results We found that detection accuracy was highly influenced by sequencing depth, alignment algorithm, and experimental enrichment protocols. ΔCJ proved effective in flagging potential false positive circles, showing improved accuracy of Intersect (circles detected by all tools) and Rosette (circles detected by ≥ 2 tools) combinations.Discussion This study offers a broad evaluation of circular detection tools, suggesting that the combination of ≥3 tools is necessary for a correct prediction. These insights will inform future experimental design and data analysis pipelines in both experimental and clinical settings.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag017), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 2:
This manuscript presents a systematic and carefully executed benchmark of eccDNA and circRNA detection tools using both in silico simulations and biological datasets. The introduction of CircleSim and, in particular, the ∆CJ metric as a proxy for detection quality in the absence of ground truth is a notable conceptual contribution. The study is generally well designed, the analyses are extensive, and the conclusions are largely supported by the data. However, some points need to be addressed to strengthen the manuscript and avoid potential misinterpretation of the results.
- Provide a concise table summarizing tool versions, aligners, key parameters et al. This would be helpful for readers attempting to replicate the benchmark.
- CircleSlim is a useful contribution, but its biological realism requires clearer justification. Circles are generated uniformly across chromosomes and transcripts, yet real eccDNA and circRNA formation is known to be biased by chromatin state, transcriptional activity, repetitive elements, and genomic architecture. The authors should explicitly discuss which biological biases are not captured by CircleSim, and explain how this affects interpretation of precision/recall values.
- The conclusion that higher sequencing coverage increases false positives is intriguing but potentially misleading if generalized. The observed decrease in F-score at high coverage appears driven by accumulation of low-confidence split reads, and tool-specific sensitivity to noise. The manuscript should clarify that high coverage per se is not intrinsically detrimental, but rather that current algorithms lack sufficient FP control at high depth without stricter filtering. Reframing this as a tool- and filter-dependent phenomenon would prevent misinterpretation.
-
AbstractIntroduction Circular nucleic acids such as extrachromosomal circular DNA (eccDNA) and circular RNA (circRNA) are increasingly recognized for their biological relevance and potential as biomarkers in disease contexts. Despite their growing importance, their detection remains challenging due to tool-specific biases, limited validation frameworks, and high variability in performance across datasets.Methods We benchmarked 10 circle detection tools across diverse conditions using both simulated and biological datasets. Our evaluation included classical performance metrics and a novel internal measure of read distribution symmetry (ΔCJ) to assess circle prediction confidence. We explored the impact of sequencing protocols, filtering strategies, and combined tool consensus.Results We found that detection accuracy was highly …
AbstractIntroduction Circular nucleic acids such as extrachromosomal circular DNA (eccDNA) and circular RNA (circRNA) are increasingly recognized for their biological relevance and potential as biomarkers in disease contexts. Despite their growing importance, their detection remains challenging due to tool-specific biases, limited validation frameworks, and high variability in performance across datasets.Methods We benchmarked 10 circle detection tools across diverse conditions using both simulated and biological datasets. Our evaluation included classical performance metrics and a novel internal measure of read distribution symmetry (ΔCJ) to assess circle prediction confidence. We explored the impact of sequencing protocols, filtering strategies, and combined tool consensus.Results We found that detection accuracy was highly influenced by sequencing depth, alignment algorithm, and experimental enrichment protocols. ΔCJ proved effective in flagging potential false positive circles, showing improved accuracy of Intersect (circles detected by all tools) and Rosette (circles detected by ≥ 2 tools) combinations.Discussion This study offers a broad evaluation of circular detection tools, suggesting that the combination of ≥3 tools is necessary for a correct prediction. These insights will inform future experimental design and data analysis pipelines in both experimental and clinical settings.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag017), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 1:
The authors have adequately addressed all of my comments and concerns, and while there are future directions that should be explored (e.g., the effect of library prep on eccDNA detection, the effect of sequencing artifacts on eccDNA detection), I agree with the authors that those tasks are slightly outside the scope of their existing manuscript. Line 105-106 has a minor grammatical error. I have no further suggestions and recommend the manuscript for publication as the comparisons performed here will help people in the field understand what tools should be used.
-
-
