Post-Translational Modification Prediction via Prompt-Based Fine-Tuning of a GPT-2 Model

Palistha Shrestha
Jeevan Kandel
Hilal Tayara
Kil To Chong

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Post-translational modifications (PTMs) are pivotal in modulating protein functions, influencing key cellular processes such as signaling, localization, and protein degradation. The complexity of these biological interactions necessitates efficient predictive methodologies. In this work, we introduce PTMGPT2, an interpretable protein language model that utilizes prompt-based fine-tuning to improve its accuracy and generalizability in precisely predicting PTMs. Drawing inspiration from recent advancements in GPT-based architectures, PTMGPT2 adopts an unsupervised learning approach to identify PTMs. It utilizes a custom prompt to guide the model through the subtle linguistic patterns encoded in amino acid sequences, generating tokens indicative of PTM sites. To provide interpretability, we visualize attention profiles from the model’s final decoder layer, elucidating sequence motifs essential for molecular recognition and modification variability. Furthermore, we conducted analyses to investigate the effects of mutations at or near PTM sites, thereby offering deeper insights into protein functionality. Our analysis encompasses a comprehensive dataset comprising 3,88,084 modification sites across 19 distinct PTM types, facilitating the identification of novel PTM sites. Comparative assessments reveal that PTMGPT2 outperforms existing methods by an average 5.45% in MCC, underscoring its potential in identifying novel therapeutic strategies, disease associations, and drug targets.

Version published to 10.21203/rs.3.rs-4194361/v1 on Research Square
Apr 1, 2024

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

A Survey on Efficient Protein Language Models

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction