Comprehensive benchmarking with guidelines for analyzing transposable element-derived RNA expression
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Transposable element-derived RNAs (teRNAs) have been recognized with accelerating fundamental or pathogenic roles, especially in human. Despite the rapid development of computational methods, the best practice for accurate identification and quantification of teRNAs are currently lacking owing to the difficulties of evaluation. Here we present benchmarking of 16 representative tools with 120 simulated datasets and 60 real-world paired datasets (comprising both long- and short-read data), by evaluating the performance of teRNA identification or quantification across family-, unit-, exon-, and transcript-level. Our findings demonstrate not only the exon-level as a trade-off between accuracy and resolution for teRNA analysis, but also the level-dependent strengths and weaknesses of evaluated methods. To refine our benchmarking results, we present decision-tree-style guidelines and develop an integrated best-practice pipeline, serving as the basis for future functional researches. In addition, our evaluation framework also provides a gold standard for developing and benchmarking better computational tools in the field.