PhosF3C: A Feature Fusion Architecture with Fine-Tuned Protein Language Model and Conformer for prediction of general phosphorylation site

Yuhuan Liu
Haitian Zhong
Jixiu Zhai
Xueying Wang
Tianchi LU

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein phosphorylation, a key post-translational modification (PTM), provides essential insight into protein properties, making its prediction highly significant. Using the emerging capabilities of large language models (LLMs), we apply LoRA fine-tuning to ESM2, a powerful protein large language model, to efficiently extract features with minimal computational resources, optimizing task-specific text alignment. Additionally, we integrate the conformer architecture with the Feature Coupling Unit (FCU) to enhance local and global feature exchange, further improving prediction accuracy. Our model achieves state-of-the-art (SOTA) performance, obtaining AUC scores of 79.5%, 76.3%, and 71.4% at the S, T, and Y sites of the general data sets. Based on the powerful feature extraction capabilities of LLMs, we conduct a series of analyses on protein representations, including studies on their structure, sequence, and various chemical properties (such as Hydrophobicity (GRAVY), Surface Charge, and Isoelectric Point). We propose a test method called Linear Regression Tomography (LRT) which is a top-down method using representation to explore the model's feature extraction capabilities, offering a pathway to improved interpretability. Our resources, including data and code, are publicly accessible at \url{https://github.com/SkywalkerLuke/PhosF3C}

Version published to 10.21203/rs.3.rs-5871318/v1 on Research Square
Jan 23, 2025

A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

This article has 4 authors:
1. Tayyip Topuz
2. Zeki Erdem
3. Halil Bisgin
4. E. Demet Akten
This article has no evaluationsLatest version Feb 2, 2026
Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Survey on Efficient Protein Language Models

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

Emergence of Biological Structural Discovery in General-Purpose Language Models