A deep learning and co-conservation framework enable discovery of non-canonical Cas proteins

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

CRISPR-Cas systems are central to prokaryotic adaptive immunity, widely harnessed for biotechnology. Yet, their vast and uncharacterized diversity, especially non-canonical variants, impedes full exploitation. Here we present BioPrinCRISPR, a class-agnostic computational framework leveraging gene co-conservation, protein domain co-occurrence, and embedding similarity to identify and characterize CRISPR-Cas systems across prokaryotic genomes. Applying BioPrinCRISPR to over one million bacterial genomes, we uncovered extensive canonical and uncharacterized systems, revealing a rich landscape of atypical Cas proteins and novel domain architectures. Notably, we identified recurrent fusion proteins with unique enzymatic combinations, suggesting roles in regulatory control or nucleic acid remodeling. Experimental validation of two divergent Cas13an-like effectors demonstrated RNA knockdown capacity in human cells, confirming our framework's predictive power. These findings expand the functional repertoire of CRISPR-associated proteins and highlight unexplored modes of microbial immunity. BioPrinCRISPR thus stands as a powerful tool for comprehensively mapping CRISPR-Cas diversity, offering new insights into prokaryotic defense and facilitating discovery of novel candidates for next-generation genome engineering. An accompanying interactive web platform was also developed to facilitate data exploration.

Article activity feed