Systematic feature and architecture evaluation reveals tokenized learned embeddings enhance siRNA efficacy prediction

Rory Coffey

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent advances in machine learning have improved the prediction of siRNA efficacy, with graph neural networks and transformer-based encodings leading the way. However, existing models still face challenges, including potential inaccuracies in thermodynamic feature calculations (such as incorrect strand selection for siRNA-mRNA Gibbs free energy), limited effective utilization of available datasets, and a lack of systematic model refinement. In this study, I systematically evaluated the predictive power of individual features and neural network architectures to identify the most effective configurations. This process led to the development of RN.Ai-Predict, a model built upon a tokenized learned embedding for nucleotide sequences. This work demonstrates that a methodical approach to feature selection and hyperparameter tuning, particularly favoring learned embeddings, can yield a more accurate and reliable model for predicting siRNA efficacy, outperforming more complex architectures in generalizability.

Version published to 10.1101/2025.08.12.669916 on bioRxiv
Aug 15, 2025

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Decoupled Representation Learning Improves Generalization in CRISPR Off-Target Prediction

This article has 2 authors:
1. Nyla Bhargava
2. Aditya Goswami
This article has no evaluationsLatest version Jan 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

A Survey on Efficient Protein Language Models

Decoupled Representation Learning Improves Generalization in CRISPR Off-Target Prediction