ChromeCRISPR - A High Efficacy Hybrid Machine Learning Model for CRISPR/Cas On-Target Predictions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genome editing has the potential to treat genetic disorders at the source. This can be achieved by modifying the defective DNA through the intentional insertion, deletion, or substitution of genomic content. Among all genome editing technologies, CRISPR/Cas (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein) is considered the gold standard. CRISPR/Cas uses a single guide RNA (sgRNA) to direct the Cas nuclease to a target DNA region. Due to the ease at creating small RNA molecules, it is possible to have the CRISPR/Cas complex target any arbitrary DNA sequence, thus making it a versatile tool. The efficacy of the complex is dependent on the ability of the sgRNA to bind to a complementary DNA sequence, which varies based on the sequence. Thus, a major challenge is finding sgRNA sequences that have good efficacy. This is where computational models can aid scientists: by predicting the activity of sgRNAs to help narrow the search space of finding the optimal sgRNA. We have used a large new dataset to build and compare the ability of several different machine learning architectures’ ability to predict on-target CRISPR/Cas activity. Additionally, we explored how adding GC content affects our sgRNA activity predictions. Our novel hybrid model, ChromeCRISPR, combines the strengths of Convolutional Neural Networks (CNN) with Recurrent Neural Network (RNN) models, has outperformed state-of-the-art models, including DeepHF and AttCrispr, establishing a new benchmark for predictive accuracy in CRISPR/Cas9 efficacy predictions.

Article activity feed