Machine Learning Reveals Intrinsic Determinants of siRNA Efficacy

Christian Mandelli
Giulia Crippa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Small interfering RNAs (siRNAs) are widely used in therapeutics and agriculture for sequence-specific gene silencing. However, siRNA efficacy remains difficult to predict due to complex dependencies on sequence, structure, and thermodynamic properties. Existing computational tools largely rely on heuristic rules or pre-scored features, limiting generalizability and biological interpretability. Here, we present a machine learning model to predict siRNA efficacy directly from intrinsic antisense sequence features. Using a dataset of 2,428 experimentally validated siRNAs, we developed a comprehensive feature set that encompasses sequence composition, regulatory motifs, thermodynamic parameters, and structural complexity. We trained and evaluated multiple models for both regression and classification tasks. Support Vector Regression (SVR) achieved the best regression performance overall, with a predictive accuracy of R = 0.719 and R ² = 0.516, while logistic regression achieved the best classification results with ROC = 0.886 and F1 = 0.809 using a combination of composition, motif, and structural features. Among all features, position-specific nucleotides were the strongest predictors of efficacy, with a uracil at the 5′ antisense end (P1_U) and an adenine at the 3′ end (P19_A) showing the highest importance, consistent with known mechanisms of strand selection and RISC loading. Our approach improves both predictive power and biological interpretability compared to existing methods, eliminating reliance on external scoring functions. The resulting framework supports rational siRNA design for therapeutic applications, functional genomics, and non-transgenic crop protection strategies.

Version published to 10.1101/2025.08.11.667724 on bioRxiv
Aug 15, 2025

Benchmarking Reveals the Superiority of Nucleic Acid Foundation Models in Predicting lncRNA Coding Potential

This article has 5 authors:
1. Yu Yang
2. Liping Ren
3. Juan Feng
4. Yang Zhang
5. Tianyuan Liu
This article has no evaluationsLatest version Dec 17, 2025
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

This article has 3 authors:
1. Brandon Yee
2. Maximilian Rutkowski
3. Wilson Collins
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking Reveals the Superiority of Nucleic Acid Foundation Models in Predicting lncRNA Coding Potential

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery