DeepEmbCas9: Cas9 coevolution and sgRNA structural information for CRISPR-Cas9 cleavage activity prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The development of CRISPR-Cas9 cleavage activity prediction tools hinges on data produced from high-throughput guide-target lentiviral library screens for different Cas9 variants. However, the majority of such tools remain limited to predictions for one or few Cas9 variants, making it difficult to quantify the effects of Cas9 residues on cleavage activity. To bridge the gap, we introduce 4 interpretable DeepEmbCas9 models for the cleavage activity prediction of 40 type II-A and II-C Cas9 variants — DeepEmbCas9, DeepEmbCas9-MVE, DeepEnsEmbCas9 naive, and DeepEnsEmbCas9 — leveraging protein and RNA language model embeddings to encode Cas9 and sgRNA, respectively. Among the 4 neural network models, DeepEnsEmbCas9 naive performed the best in both in-distribution and out-of-distribution settings, where DeepEnsEmbCas9 naive outperformed individual Cas9 cleavage activity prediction tools on 18 out of 51 and 17 out of 48 benchmark test sets, respectively, and performed comparably otherwise. Concerning uncertainty quantification, DeepEnsEmbCas9 yields quantile-calibrated uncertainty estimates while keeping a minimal performance drop compared to DeepEnsEmbCas9 naive. SHAP importance analysis on DeepEmbCas9 reaffirms the importance of Cas9-target PAM binding as a first step for Cas9 cleavage, and reveals the L2 linker and PLL-WED-PI as important Cas9 domains modulating DeepEmbCas9’s predicted activity change when introducing increased-fidelity and PAM-altering Cas9 mutations, respectively. Our findings demonstrate the usefulness of protein language model embeddings in uncertainty-aware Cas9 cleavage activity prediction. More generally, DeepEmbCas9 models serves as an initial step towards cleavage activity prediction modelling for the whole Cas9 protein family.

Article activity feed