Structure-Based Classification of CRISPR/Cas9 Proteins: A Machine Learning Approach to Elucidating Cas9 Allostery
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The CRISPR/Cas9 system is a powerful gene-editing tool. Its specificity and stability rely on complex allosteric regulation. Understanding these allosteric regulations is essential for developing high-fidelity Cas9 variants with reduced off-target effects. Here, we introduce a novel structure-based machine learning (ML) approach to systematically identify long-range allosteric networks in Cas9. Our ML model was trained using all available Cas9 structures, ensuring a comprehensive representation of Cas9 structural landscape. We then applied this model to Streptococcus pyogenes Cas9 (SpCas9) to demonstrate the feature selection process. Using the Cα-Cα inter-residue distances, we mapped key allosteric networks and refined them through a two-stage SHAP feature selection (FS) strategy, reducing a vast feature space to 28 critical Lysine-Arginine (Lys-Arg) residue pairs that mediate SpCas9 interdomain communication, stability, and specificity. These Lys-Arg pairs initially shared a 46.5Å inter-residue distance, but molecular dynamics simulations revealed distinct stabilization behaviors, indicating a hierarchical allosteric network. Further mutational analysis of R78A-K855A (M1) and R765A-K1246A (M2) identified an electrostatic valley, a stabilizing network where positively charged residues interact with negatively charged DNA to maintain SpCas9 structural integrity. Disrupting this valley through direct (M2) or allosteric (M1) mutations destabilized SpCas9 DNA-bound conformation, leading to distinct pathways for improving SpCas9 specificity. This study provides a new framework for understanding allostery in Cas9, integrating ML-driven structural analysis with MD simulations. By identifying key allosteric residues and introducing the electrostatic valley as a central concept, we offer a rational strategy for engineering high-fidelity Cas9 variants. Beyond Cas9, our approach can be applied to uncover allosteric hotspots in other enzyme regulation and rational protein design.