Extending Prot2Token: Aligning Protein Language Models for Unified and Diverse Protein Prediction Tasks

Mahdi Pourmirzaei
Ye Han
Farzaneh Esmaili
Mohammadreza Pourmirzaei
Salhuldin Alqarghuli
Kai Chen
Dong Xu

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Comprehensive protein function and property prediction remains a major challenge due to the vast diversity of sequences, structural variations, and limited labeled data. Existing models are often specialized to be task-specific, requiring independent training, which limits scalability. To address this, we extend Prot2Token, a unified autoregressive framework that focuses on the post-training alignment of pre-trained protein language models (PLMs), to new applications. Our approach enables next-token prediction across new applications of proteinprediction tasks, including protein-protein structure similarity, 3D structure prediction, mutation stability, post-translational modifications (PTMs), substratekinase phosphorylation sites, protein-protein affinity, and protein-ion binding sites. We introduce a self-supervised pre-training stage for the decoder, enhancing model initialization and improving downstream predictions. By integrating a causal autoregressive transformer with a pre-trained ESM-2 encoder, our model effectively aligns diverse protein tasks within a single framework. Additionally, we discuss the opportunities and limitations of this approach, providing insights for future research in optimizing PLMs as a general tool for broader biological applications. Code is available on GitHub Repository.

Version published to 10.1101/2025.03.03.641065v1 on bioRxiv
Mar 11, 2025

P 3: A Framework for Predicting Protein-Protein Interactions Using Large Language Models

This article has 6 authors:
1. Lamiaa Basyoni
2. Jovana Aleksic
3. Stephanie Schaefer-Ramadan
4. Yue Guan
5. Joel Malek
6. Ahmed Serag
This article has no evaluationsLatest version May 22, 2025
Mechanism-Aware Protein-Protein Interaction Prediction via Contact-Guided Dual Attention on Protein Language Models

This article has 5 authors:
1. Shuchen Deng
2. Xuanjun Wan
3. Zichun Mu
4. Sheng-You Huang
5. Chengfei Yan
This article has no evaluationsLatest version Jul 11, 2025
GeneChat: A Multi-Modal Large Language Model for Gene Function Prediction

This article has 3 authors:
1. Shashi Dhanasekar
2. Akash Saranathan
3. Pengtao Xie
This article has no evaluationsLatest version Jun 6, 2025

Listed in

Abstract

Article activity feed

Related articles

P 3: A Framework for Predicting Protein-Protein Interactions Using Large Language Models

Mechanism-Aware Protein-Protein Interaction Prediction via Contact-Guided Dual Attention on Protein Language Models

GeneChat: A Multi-Modal Large Language Model for Gene Function Prediction