Meta-learning on property matrices and LLM embeddings enables state-of-the-art prediction of gene knockdown by modified siRNAs

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Six small interference RNAs (siRNAs) have been approved as therapeutics making them promising nanosystems due to selective gene knockdown activity. siRNA design is complex due to various factors, where the chemical modifications are crucial to improve its half-life and stability. Machine learning (ML) enabled more efficient analysis of siRNA data, moreover predicting efficacy and off-target effects. This work proposes a novel pipeline for predicting gene knockdown activity of chemically modified siRNAs across the whole range of activities leveraging both descriptors of siRNA chemical composition-aware property matrices and large language model (LLM) embeddings for target gene encoding. Several general-purpose and domain-specific fine tuned LLMs were benchmarked on the target task, where the Mistral 7B general-purpose model outperformed even the models pre-trained on genomic data. Proposed model based on meta-learning mechanism successfully mitigates data imbalance towards moderate-to-high active constructs and achieves state-of-the-art (SOTA) quality with R2 = 0.84 and a RMSE = 12.27% on unseen data, where the probabilistic outputs of classifiers trained with F-scores up to 0.92 were used as additional descriptors. By filling the gap in the field of advanced chemical composition-aware siRNA design, our model aims to improve the efficacy of developed siRNA-based therapies.

Article activity feed