Benchmarking siRNA Prediction: The Role of Representation and Validation Strategies

Aparajita Karmakar
Abdulhamid Merii
Angus Weir
Grzegorz Kudla
Mark Basham
Alex Lubbock

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Small interfering RNAs (siRNAs) offer transformative potential for targeted therapeutics, yet the design of highly effective and non-toxic candidates is hindered by the risk of off-target effects and RNA instability. A critical flaw in in silico prediction models is pervasive data leakage in cross-validation protocols, which artificially inflates performance metrics and produces untrustworthy results. To address this, we developed a rigorous framework that eliminates data leakage through strict cross-validation, leverages z-curves (3D representations of RNA physico-chemical properties) for context-aware sequence encoding, and identifies key sequence regions critical for efficacy. Our model achieves an AUC of 0.845 on leakage-free validation, surpassing prior work at 380x faster computation speed, demonstrating that superior representation trumps model complexity. Crucially, we demonstrate how experimental variability and cross-validation choices directly impact model reliability, establishing the first benchmarked methods for robust siRNA efficacy prediction. This work provides a foundation for trustworthy sequence design and validation in RNA therapeutics.

Version published to 10.64898/2026.05.12.724560 on bioRxiv
May 14, 2026

AI platform for CRISPR functional mapping and function-based drug design

This article has 5 authors:
1. Jason C. Ngo
2. Vivien A.C. Schoonenberg
3. Renu Nandakumar
4. Xuebing Wu
5. Falak Sher
This article has no evaluationsLatest version May 11, 2026
Resolution of recursive data corruption to transform T-cell epitope discovery

This article has 9 authors:
1. Grzegorz Preibisch
2. Michał Tyrolski
3. Piotr Kucharski
4. Stanislaw Giziński
5. Piotr Grzegorczyk
6. Sungho Moon
7. Sangwoo Kim
8. Balyn Zaro
9. Anna Gambin
This article has no evaluationsLatest version Apr 1, 2026
misoTar: A novel approach for predicting miRNA and isomiR targets

This article has 3 authors:
1. Rony Chowdhury Ripan
2. Xiaoman Li
3. Haiyan Hu
This article has no evaluationsLatest version May 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AI platform for CRISPR functional mapping and function-based drug design

Resolution of recursive data corruption to transform T-cell epitope discovery

misoTar: A novel approach for predicting miRNA and isomiR targets