ELITE: E3 Ligase Inference for Tissue specific Elimination: A LLM Based E3 Ligase Prediction System for Precise Targeted Protein Degradation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Targeted protein degradation (TPD) has transformed modern drug discovery by harnessing the ubiquitin–proteasome system to eliminate disease-driving proteins previously deemed undruggable. However, current approaches predominantly rely on a narrow set of ubiquitously expressed E3 ligases, such as Cereblon (CRBN) and Von Hippel–Lindau (VHL), which limits tissue specificity, increases systemic toxicity, and fosters resistance. Here, we present an AI-driven framework for the rational identification of tissue-specific E3 ligases suitable for precision-targeted degradation. Our model leverages a BERT-based protein language architecture trained on billions of sequences to generate contextual embeddings that capture structural and functional motifs relevant for E3–substrate compatibility. By integrating these embeddings with tissue-resolved protein-protein interaction data, the framework predicts ligase–target interactions that are both biologically plausible and context-restricted. This enables the prioritization of ligases capable of driving selective degradation of pathogenic proteins within disease-relevant tissues. The proposed approach offers a scalable path to expand the E3 ligase repertoire and advance TPD toward true precision medicine.
Significance
Current computational strategies for targeted protein degradation (TPD) largely ignore biological context, relying on global E3–substrate datasets that overlook tissue specificity and thus cannot anticipate off-target toxicity or resistance. Our work introduces a context-aware, AI-driven framework that integrates large-scale protein language model embeddings with tissue-resolved protein-protein interaction data to predict compatible and tissue-selective E3 ligases for any pathogenic target. By learning biochemical compatibilities directly from sequence space, this approach transcends the limitations of curated interaction databases and enables generalization to novel ligases or disease mutations. The resulting ligand–target rankings combine biochemical plausibility with spatial expression constraints, providing a scalable foundation for designing degraders that act precisely within disease-relevant tissues. This represents a conceptual and technical advance toward precision degradomics - a next generation of targeted protein degradation where therapeutic selectivity is defined not only by molecular affinity but also by cellular and tissue context. Subsequent steps will include structural integration for ternary complex modeling, biochemical validation of predicted pairs, and the deployment of a public platform to guide tissue-specific degrader discovery.
Data science maturity
DSML3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problems