PSTP: Decoding Latent Sequence Grammar for Protein Phase Separation through Transfer Learning and Attention

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Phase separation (PS) is essential in various biological processes, necessitating high-accuracy predictive algorithms for studying numerous uncharacterized sequences, and accelerating experimental validation. However, many recent prediction methods face challenges in generalizability due to their reliance on engineered features. Furthermore, accurately identifying protein regions involved in PS remains challenging. To address this, we propose PSTP, a model employing a dual-language model embedding strategy and a lightweight attention module. The attention layer enables reliable residue-level phase separation predictions, identifying 84% of PS regions in PhaSePro and substantially improving correlation coefficient compared to existing models. PSTP also demonstrates robust performance in predicting PS propensity across various types of PS proteins and shows potential for predicting artificial proteins. By analyzing 160,000+ variants, PSTP characterizes the link between the incidence of pathogenic variants and residue-level PS propensities. PSTP’s predictive power and broad applicability make it a valuable tool for understanding biomolecular condensates and disease mechanisms.

Article activity feed