Machine Learning Reveals Intrinsic Determinants of siRNA Efficacy

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Small interfering RNAs (siRNAs) are widely used in therapeutics and agriculture for sequence-specific gene silencing. However, siRNA efficacy remains difficult to predict due to complex dependencies on sequence, structure, and thermodynamic properties. Existing computational tools largely rely on heuristic rules or pre-scored features, limiting generalizability and biological interpretability. Here, we present a machine learning model to predict siRNA efficacy directly from intrinsic antisense sequence features. Using a dataset of 2,428 experimentally validated siRNAs, we developed a comprehensive feature set that encompasses sequence composition, regulatory motifs, thermodynamic parameters, and structural complexity. We trained and evaluated multiple models for both regression and classification tasks. Support Vector Regression (SVR) achieved the best regression performance overall, with a predictive accuracy of R = 0.719 and R 2 = 0.516, while logistic regression achieved the best classification results with ROC = 0.886 and F1 = 0.809 using a combination of composition, motif, and structural features. Among all features, position-specific nucleotides were the strongest predictors of efficacy, with a uracil at the 5′ antisense end (P1_U) and an adenine at the 3′ end (P19_A) showing the highest importance, consistent with known mechanisms of strand selection and RISC loading. Our approach improves both predictive power and biological interpretability compared to existing methods, eliminating reliance on external scoring functions. The resulting framework supports rational siRNA design for therapeutic applications, functional genomics, and non-transgenic crop protection strategies.

Article activity feed