Task-Specialized Protein Language Models Decode the Sequence Grammar of Post-Translational Modification Sites

Subinoy Adhikari
Jagannath Mondal

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Post-translational modifications (PTMs) regulate protein signaling, localization, degradation, and cellular decision-making, yet the sequence determinants that distinguish modified from chemically eligible but unmodified residues remain difficult to decode at proteome scale. Here, we examine whether adapting a general protein language model to PTM-site prediction can reveal the biochemical logic underlying residue-level modification. We fine-tune ESM2, a protein language model trained on tens of millions of evolutionarily diverse protein sequences, for phosphorylation, acetylation, and ubiquitination-site prediction. To address the pronounced class imbalance inherent in proteome-wide PTM annotation, we combine parameter-efficient fine-tuning with focal-loss training. The resulting task-specialized models show that PTM recognition depends on model capacity, annotation depth, and modification chemistry: phosphorylation benefits from larger models, whereas acetylation and ubiquitination peak at intermediate scale. More importantly, the fine-tuned phosphorylation model exposes three layers of biological organization: it recovers canonical kinase-recognition motifs without kinase-label supervision, resolves pathway-level functional relationships among proteins from sequence-derived embeddings, and preserves evolutionary signatures of homologous phosphorylation sites across 200 eukaryotic species. These results establish task-specialized protein language models as interpretable instruments for probing PTM-site biochemistry, kinase specificity, functional organization, and evolutionary conservation.

Version published to 10.64898/2026.05.08.723918 on bioRxiv
May 12, 2026

EvoRMD: Integrating Biological Context and Evolutionary RNA Language Models for Interpretable Prediction of RNA Modifications

This article has 6 authors:
1. Bo Wang
2. Hao Zhang
3. Taoyong Cui
4. Xiaoyu Wang
5. Jiangning Song
6. Hao Xu
This article has no evaluationsLatest version Mar 25, 2026
An LLM-driven pipeline for proteomics-based detection and structural modeling of post-translational modifications

This article has 6 authors:
1. August George
2. Daniel Mejia-Rodriguez
3. Xiaolu Li
4. Paul Rigor
5. Margaret S. Cheung
6. Aivett Bilbao
This article has no evaluationsLatest version May 6, 2026
ProteinFlux: accurate, rapid and scalable generative prediction of protein dynamics driven by post-translational modifications

This article has 24 authors:
1. Qiuting Qian
2. Jiahua Peng
3. Dongge Ma
4. Kaining Liu
5. Yihan Cheng
6. Yanqi Deng
7. Jingjing Zhao
8. Shen Su
9. Yifei Yao
10. Yuan Qu
11. Rao Fu
12. Jinliang Liu
13. Make Zhao
14. Yueming Xiao
15. Keying Wang
16. Yizhen Wu
17. Yijun Wang
18. Qiufan Xu
19. Jiarui Wang
20. David C. Hay
21. Yuehai Ke
22. Yong Wang
23. Michael J. Shipston
24. Ying Chi
This article has no evaluationsLatest version May 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

EvoRMD: Integrating Biological Context and Evolutionary RNA Language Models for Interpretable Prediction of RNA Modifications

An LLM-driven pipeline for proteomics-based detection and structural modeling of post-translational modifications

ProteinFlux: accurate, rapid and scalable generative prediction of protein dynamics driven by post-translational modifications