Meta-learning on property matrices and LLM embeddings enables state-of-the-art prediction of gene knockdown by modified siRNAs

Ivan Golovkin
Denis Shatkovskii
Nikita Serov

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Six small interference RNAs (siRNAs) have been approved as therapeutics making them promising nanosystems due to selective gene knockdown activity. siRNA design is complex due to various factors, where the chemical modifications are crucial to improve its half-life and stability. Machine learning (ML) enabled more efficient analysis of siRNA data, moreover predicting efficacy and off-target effects. This work proposes a novel pipeline for predicting gene knockdown activity of chemically modified siRNAs across the whole range of activities leveraging both descriptors of siRNA chemical composition-aware property matrices and large language model (LLM) embeddings for target gene encoding. Several general-purpose and domain-specific fine tuned LLMs were benchmarked on the target task, where the Mistral 7B general-purpose model outperformed even the models pre-trained on genomic data. Proposed model based on meta-learning mechanism successfully mitigates data imbalance towards moderate-to-high active constructs and achieves state-of-the-art (SOTA) quality with R2 = 0.84 and a RMSE = 12.27% on unseen data, where the probabilistic outputs of classifiers trained with F-scores up to 0.92 were used as additional descriptors. By filling the gap in the field of advanced chemical composition-aware siRNA design, our model aims to improve the efficacy of developed siRNA-based therapies.

Version published to 10.21203/rs.3.rs-7336200/v1 on Research Square
Sep 8, 2025

Decoupled Representation Learning Improves Generalization in CRISPR Off-Target Prediction

This article has 2 authors:
1. Nyla Bhargava
2. Aditya Goswami
This article has no evaluationsLatest version Jan 18, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Closed-Loop Workflow of High-Entropy Materials Discovery: Efficient and Accurate Synthesizability Prediction via Domain-Specific Local LLMs

This article has 3 authors:
1. Yeongjun Yoon
2. Geun Ho Gu
3. Kyeounghak Kim
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Decoupled Representation Learning Improves Generalization in CRISPR Off-Target Prediction

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Closed-Loop Workflow of High-Entropy Materials Discovery: Efficient and Accurate Synthesizability Prediction via Domain-Specific Local LLMs