Machine Learning-Based Prediction of Base Editor sgRNA fitness score

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

CRISPR Base Editors enable precise single-nucleotide modifications, offering advantages over CRISPR-Cas9 knock-out in programming the desired genetic effect. However, in pooled screens targeting essential genes, discrepancies between expected genetic and phenotypic outcomes are frequent: single guide RNAs (sgRNAs), expected to be disruptive, often appear “phenotypically silent” likely due to inefficient editing rather than absence of functional impact. Here, we investigate if Cas9-based gene-level sgRNA depletion data can help to predict the probability that an sgRNA used in base editing will yield the expected fitness effect in pooled proliferation screening. We analysed proliferative effects (z-scores) from high-throughput CRISPR screens using cytosine Base Editors (BEs) and trained machine learning models to predict fitness effects. Our models integrate sequence features, edited strand, mutation type, predicted editing efficiencies and Cas9 gene essentiality scores. Our models discriminate BE sgRNAs that generate a strong phenotypic effect (depletion) in pooled screening, with AUC-ROC greater than 93% in different cell lines. We provide exhaustive analysis of feature importance highlighting the significant impact of sequence features for predicting BE-associated fitness effects. We found that editor-associated fitness predictions are primarily driven by sgRNA sequence features rather than predicted editing efficiency. Moreover, Cas9-derived gene essentiality partially contributes to predictions.

Article activity feed