Pro4S: prediction of protein solubility by fusing sequence, structure, and surface

Jie Qian
Lin Yang
Renxiao Wang
Yifei Qi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein solubility is a critical physicochemical property influencing protein stability, therapeutic efficacy, and overall developability in drug discovery. However, traditional experimental methods for assessing solubility are often resource-intensive and time-consuming. To address these limitations, computational approaches leveraging artificial intelligence have emerged, yet current models generally treat qualitative classification and quantitative regression as separate tasks and rely predominantly on sequence-based information, neglecting crucial structural and surface characteristics. Here, we introduce Pro4S, a novel multimodal predictive model that integrates protein language models, structural data, and surface descriptors using advanced contrastive learning techniques. Our unified framework achieves significant improvements in prediction accuracy, robustness, and generalizability for both qualitative and quantitative solubility assessments. Benchmark comparisons demonstrate that Pro4S consistently outperforms existing state-of-the-art predictors across diverse datasets. Furthermore, by applying Pro4S to the emerging area of de novo protein design, we validated a strong correlation between predicted solubility and experimental expression levels, reducing the proportion of non-expressed proteins by 52.7% while retaining 96.7% of highly expressed proteins. This highlights Pro4S’s potential to serve as a reliable upfront screening tool for increasing expression success rates and accelerating rational protein engineering.

Version published to 10.1101/2025.11.05.686869 on bioRxiv
Nov 7, 2025

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Drug discovery guided by maximum drug likeness

This article has 3 authors:
1. Hao-Yu Zhu
2. Lu Xu
3. Wei Shi
This article has no evaluationsLatest version Dec 31, 2025
Predictive Bioactivity Modeling and Structural Binding Analysis for the Identification of Potential SMYD3 Modulators

This article has 4 authors:
1. Abdullah R. Alzahrani
2. Zia Ur Rehman
3. Talha Jawaid
4. Abida Khan
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Drug discovery guided by maximum drug likeness

Predictive Bioactivity Modeling and Structural Binding Analysis for the Identification of Potential SMYD3 Modulators