ChromeCRISPR - A High Efficacy Hybrid Machine Learning Model for CRISPR/Cas On-Target Predictions
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome editing has the potential to treat genetic disorders at the source. This can be achieved by modifying the defective DNA through the intentional insertion, deletion, or substitution of genomic content. Among all genome editing technologies, CRISPR/Cas (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein) is considered the gold standard. CRISPR/Cas uses a single guide RNA (sgRNA) to direct the Cas nuclease to a target DNA region. Due to the ease at creating small RNA molecules, it is possible to have the CRISPR/Cas complex target any arbitrary DNA sequence, thus making it a versatile tool. The efficacy of the complex is dependent on the ability of the sgRNA to bind to a complementary DNA sequence, which varies based on the sequence. Thus, a major challenge is finding sgRNA sequences that have good efficacy. This is where computational models can aid scientists: by predicting the activity of sgRNAs to help narrow the search space of finding the optimal sgRNA. We have used a large new dataset to build and compare the ability of several different machine learning architectures’ ability to predict on-target CRISPR/Cas activity. Additionally, we explored how adding GC content affects our sgRNA activity predictions. Our novel hybrid model, ChromeCRISPR, combines the strengths of Convolutional Neural Networks (CNN) with Recurrent Neural Network (RNN) models, has outperformed state-of-the-art models, including DeepHF and AttCrispr, establishing a new benchmark for predictive accuracy in CRISPR/Cas9 efficacy predictions.