Structure-Based Classification of CRISPR/Cas9 Proteins: A Machine Learning Approach to Elucidating Cas9 Allostery

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The CRISPR/Cas9 system is a powerful gene-editing tool. Its specificity and stability rely on complex allosteric regulation. Understanding these allosteric regulations is essential for developing high-fidelity Cas9 variants with reduced off-target effects. Here, we introduce a novel structure-based machine learning (ML) approach to systematically identify long-range allosteric networks in Cas9. Our ML model was trained using all available Cas9 structures, ensuring a comprehensive representation of Cas9 structural landscape. We then applied this model to Streptococcus pyogenes Cas9 (SpCas9) to demonstrate the feature selection process. Using the Cα-Cα inter-residue distances, we mapped key allosteric networks and refined them through a two-stage SHAP feature selection (FS) strategy, reducing a vast feature space to 28 critical Lysine-Arginine (Lys-Arg) residue pairs that mediate SpCas9 interdomain communication, stability, and specificity. These Lys-Arg pairs initially shared a 46.5Å inter-residue distance, but molecular dynamics simulations revealed distinct stabilization behaviors, indicating a hierarchical allosteric network. Further mutational analysis of R78A-K855A (M1) and R765A-K1246A (M2) identified an electrostatic valley, a stabilizing network where positively charged residues interact with negatively charged DNA to maintain SpCas9 structural integrity. Disrupting this valley through direct (M2) or allosteric (M1) mutations destabilized SpCas9 DNA-bound conformation, leading to distinct pathways for improving SpCas9 specificity. This study provides a new framework for understanding allostery in Cas9, integrating ML-driven structural analysis with MD simulations. By identifying key allosteric residues and introducing the electrostatic valley as a central concept, we offer a rational strategy for engineering high-fidelity Cas9 variants. Beyond Cas9, our approach can be applied to uncover allosteric hotspots in other enzyme regulation and rational protein design.

Article activity feed