Machine Learning-Driven Optimization of Specific, Compact, and Efficient Base Editors via Single-Round Diversification

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Cytosine and adenosine base editors show great potential in research and clinical applications. Current iterations of the deaminase—the enzyme used to create precise single-nucleotide changes via base editing—exhibit various off-target effects, including Cas-independent off-targeting, off-base editing, and bystander editing. Engineered deaminases are typically derived from eukaryotic deaminases, which are larger and exhibit high levels of Cas-independent DNA editing, or from evolved variants of the E. coli TadA protein (ecTadA), which are smaller but frequently cause off-base editing. To overcome the limitations inherent to using a single protein sequence as the basis for engineering, we diversified 95 newly identified TadA orthologs by introducing literature-derived mutations and DNA shuffling to yield millions of training sequences for measuring base editor efficiency. Rather than pursuing multiple rounds of random mutagenesis and selection, we trained generative models on the performance data from the diversified pools of variants and drew on information-theoretic insights to efficiently explore the deaminase sequence space to generate diverse and high-performing deaminases. From a single round of diversification, we created a small set of novel and specific cytosine and adenosine deaminases that were markedly distinct in sequence from published base editor deaminases. We additionally found that the deaminases created by our model generally outperform those which we identified through typical directed evolution. The novel adenosine and cytosine deaminases identified in this work showed high on-base activity, comparable to the leading published base editors, but with demonstrably lower off-base activity. The cytosine deaminases were particularly compact compared to known sequences due to a truncation in their final α-helix.

Article activity feed