ProteomeLM: A proteome-scale language model allowing fast prediction of protein-protein interactions and gene essentiality across taxa

Cyril Malbranke
Gionata Paolo Zalaffi
Anne-Florence Bitbol

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Language models starting from biological sequence data are advancing many inference problems, both at the scale of single proteins, and at the scale of genomic neighborhoods. In this paper, we introduce ProteomeLM, a transformer-based language model that reasons on entire proteomes from species spanning the tree of life. Leveraging protein language model embeddings, ProteomeLM is trained to reconstruct masked protein embeddings using the whole proteomic context. It thus learns contextualized protein representations reflecting proteome-scale functional constraints. We show that ProteomeLM spontaneously captures protein-protein interactions (PPI) in its attention coefficients. We demonstrate that it screens whole interactomes orders of magnitude faster than amino-acid coevolution-based methods, and substantially outperforms them. We further develop ProteomeLM-PPI, a supervised PPI prediction network that combines ProteomeLM embeddings and attention coefficients, and achieves state-of-the-art performance across species and benchmarks. Finally, we introduce ProteomeLM-Ess, a supervised predictor of gene essentiality that generalizes across diverse taxa. Our results highlight the power of proteome-scale language models for addressing function and interactions at the organism level.

Version published to 10.1101/2025.08.01.668221 on bioRxiv
Aug 3, 2025

The Deep Core: Mapping the 0.91% Regulatory Backbone of the Human Proteome and Its Role in Cancer Drug Resistance

This article has 1 author:
1. Andres Pirolo
This article has no evaluationsLatest version Feb 4, 2026
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Transcriptome Graph Transformer--A Graph Transformer-Based Unsupervised Model for Transcriptome Data Analysis

This article has 3 authors:
1. Teng Long
2. Sachit Satyal
3. Jean Gao
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Deep Core: Mapping the 0.91% Regulatory Backbone of the Human Proteome and Its Role in Cancer Drug Resistance

A Survey on Efficient Protein Language Models

Transcriptome Graph Transformer--A Graph Transformer-Based Unsupervised Model for Transcriptome Data Analysis