Benchmarking RNA-seq Tools for Real-World Diagnostic Applications
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Pediatric neuromuscular diseases are genetically and clinically heterogeneous. A substantial proportion remain without a definitive genetic diagnosis despite available clinical molecular testing. RNA-sequencing (RNA-seq) can be used to complement genome or exome sequencing to elucidate or to identify the functional impact of variants of uncertain significance, but when manually analyzed is limited to candidate DNA variants or phenotype-driven gene lists. Open-source computational tools have been developed to systematically and unbiasedly analyze RNA-seq data for aberrant splicing, expression, or allelic imbalance. However, best use practice of these tools is yet to be established. Methods To assess the performance of selected tools, we collected RNA-seq from 97 previously diagnosed samples to establish a truth set for benchmarking. Pathogenic variants were categorized as: true positives with confirmed aberrant RNA events and true negatives with no transcriptomic effect. We assessed performance of eight commonly used tools for splicing, expression and allelic imbalance analysis. We then applied the optimal strategy to 74 undiagnosed RNA-seq samples to identify new candidate diagnoses. Results Across 68 diagnosed probands with aberrant RNA events, tools correctly identified 28 diagnoses. Splicing analysis tools provided most of the findings, but allelic imbalance tools uniquely identified 4, underscoring their value. Conversely, the false positive rate was highest for the splice tools and lowest for expression analysis. Application of tools led to identification of candidate variants for only 9 out of 74 undiagnosed patients. Conclusions Inclusion of RNA-seq tools can expedite variant prioritization, characterization and interpretation in the diagnostic pipeline but remain complementary to manual analysis of loci where candidate variants were identified by DNA sequencing.