Protein codes promote selective subcellular compartmentalization

Henry R. Kilgore
Itamar Chinn
Peter G. Mikhael
Ilan Mitnikov
Catherine Van Dongen
Guy Zylberberg
Lena Afeyan
Salman F. Banani
Susana Wilson-Hawken
Tong Ihn Lee
Regina Barzilay
Richard A. Young

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Cells have evolved mechanisms to distribute ~10 billion protein molecules to subcellular compartments where diverse proteins involved in shared functions must assemble. In this study, we demonstrate that proteins with shared functions share amino acid sequence codes that guide them to compartment destinations. We developed a protein language model, ProtGPS, that predicts with high performance the compartment localization of human proteins excluded from the training set. ProtGPS successfully guided generation of novel protein sequences that selectively assemble in the nucleolus. ProtGPS identified pathological mutations that change this code and lead to altered subcellular localization of proteins. Our results indicate that protein sequences contain not only a folding code but also a previously unrecognized code governing their distribution to diverse subcellular compartments.

Version published to 10.1126/science.adq2634
Mar 7, 2025
Arcadia Science
Feb 7, 2025

The area under the receiver operator curve (AUC-ROC) showed that protein compartments could be predicted with remarkable accuracy (0.83-0.95) across the 12 different compartments (Fig. 1D).

ESM2 performance can be sensitive to the makeup of training data used (e.g. https://www.biorxiv.org/content/10.1101/2024.03.07.584001v1.abstract). Specifically, class biases in training data can be recapitulated in generated sequences.

Given that AUC-ROC varies as a function of compartment type (Fig 1D) and the compartments themselves are associated with diverse input sequence numbers (Fig 1B), I wonder if you examined possible biases in ProtGPS's behavior? Does ProtGPS more readily generate sequences that are suited for certain compartments than others? Is this explainable by the statistical distribution of the training data?

Read the original source
Version published to 10.1101/2024.04.15.589616 on bioRxiv
Apr 17, 2024

Biomolecular condensation using de novo designed globular proteins

This article has 8 authors:
1. Derek Woolfson
2. Andrey Romanyuk
3. Ragesh Kumar
4. Stephen Cross
5. Katarzyna Ozga
6. Rokas Petrenas
7. Joel Chubb
8. Jennifer McManus
This article has no evaluationsLatest version Jan 16, 2026
A proteome-wide screen for membrane-interactions in intrinsically disordered regions of transmembrane proteins reveals a role in disease

This article has 6 authors:
1. Kenneth Madsen
2. Joscha Rombach
3. Tommas Nielsen
4. Jan Hendrik Schmidt
5. Junior Agenant
6. Andreas Larsen
This article has no evaluationsLatest version Jan 7, 2026
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Biomolecular condensation using de novo designed globular proteins

A proteome-wide screen for membrane-interactions in intrinsically disordered regions of transmembrane proteins reveals a role in disease

The Evolution of the AlphaFold Architecture