ELITE: E3 Ligase Inference for Tissue specific Elimination: A LLM Based E3 Ligase Prediction System for Precise Targeted Protein Degradation

Sabyasachi Patjoshi
Sumit Madan
Holger Fröhlich

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Targeted protein degradation (TPD) has transformed modern drug discovery by harnessing the ubiquitin–proteasome system to eliminate disease-driving proteins previously deemed undruggable. However, current approaches predominantly rely on a narrow set of ubiquitously expressed E3 ligases, such as Cereblon (CRBN) and Von Hippel–Lindau (VHL), which limits tissue specificity, increases systemic toxicity, and fosters resistance. Here, we present an AI-driven framework for the rational identification of tissue-specific E3 ligases suitable for precision-targeted degradation. Our model leverages an BERT-based protein language architecture trained on billions of sequences to generate contextual embeddings that capture structural and functional motifs relevant for E3–substrate compatibility. By integrating these embeddings with tissue-resolved protein-protein interaction data, the framework predicts ligase–target interactions that are both biologically plausible and context-restricted. This enables the prioritization of ligases capable of driving selective degradation of pathogenic proteins within disease-relevant tissues. The proposed approach offers a scalable path to expand the E3 ligase repertoire and advance TPD toward true precision medicine. The AI architecture used in this study builds upon a well-established, state-of-the-art protein language modeling framework rather than introducing a fundamentally new model design. The principal contribution instead lies in the deliberate engineering of a task-specific dataset that integrates large-scale protein sequence embeddings with tissue-resolved expression and interaction information tailored to targeted protein degradation. By curating and harmonizing these heterogeneous data modalities for the explicit purpose of identifying tissue-selective E3 ligases, we enable a generalizable foundation model to be effectively adapted to a previously unexplored application context. This data-centric advance underpins the framework’s ability to generate biologically meaningful, context-aware predictions and represents a central contribution of the present work.

Significance

Current computational strategies for targeted protein degradation (TPD) largely ignore biological context, relying on global E3–substrate datasets that overlook tissue specificity and thus cannot anticipate off-target toxicity or resistance. Our work introduces a context-aware, AI-driven framework that integrates large-scale protein language model embeddings with tissue-resolved protein-protein interaction data to predict compatible and tissue-selective E3 ligases for any pathogenic target. By learning biochemical compatibilities directly from sequence space, this approach transcends the limitations of curated interaction databases and enables generalization to novel ligases or disease mutations. The resulting ligand–target rankings combine biochemical plausibility with spatial expression constraints, providing a scalable foundation for designing degraders that act precisely within disease-relevant tissues. This represents a conceptual and technical advance toward precision degradomics - a next generation of targeted protein degradation where therapeutic selectivity is defined not only by molecular affinity but also by cellular and tissue context. Subsequent steps will include structural integration for ternary complex modeling, biochemical validation of predicted pairs, and the deployment of a public platform to guide tissue-specific degrader discovery.

Data science maturity

DSML3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problems

Version published to 10.1101/2025.11.05.686884 on bioRxiv
Nov 7, 2025

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

This article has 7 authors:
1. Valentina Carbonari
2. Annamaria Defilippo
3. Ugo Lomoio
4. Caterina Francesca Perri
5. Barbara Puccio
6. Pierangelo Veltri
7. Pietro Hiram Guzzi
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Significance

Data science maturity

Article activity feed

Related articles

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

A Survey on Efficient Protein Language Models

Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome