ProtSpace: Protein Universe in Your Browser

Tobias Senoner
Peyman Vahidi
Tobias Olenyi
Florin Senoner
Gökhan Sisman
Elias Kahl
Burkhard Rost
Ivan Koludarov

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein Language Models (pLMs) generate per-protein embeddings that encode functional, structural, and evolutionary information, yet the relationships captured in these representations remain difficult to explore systematically. ProtSpace ( https://protspace.app ) is a web application for interactive visualization of pLM embedding spaces, enabling hypothesis generation directly in the browser without installation. Unlike traditional network-based tools that exclusively visualize amino acid sequence similarity, ProtSpace explores embedding spaces, revealing relationships often not captured by traditional comparisons. Users provide protein sequences or pre-computed embeddings through a Google Colab notebook or the Python CLI; the pipeline applies dimensionality reduction, retrieves 38 annotation types spanning UniProt, InterPro, NCBI Taxonomy, TED structural domains, and sequence-based predictors served via Biocentral, and produces a portable binary file for the browser-based viewer. WebGL-accelerated rendering supports interactive exploration of over 570,000 proteins. Distinctive features include per-point pie charts for multi-label annotations and integrated 3D structure viewing through AlphaFold2 predictions. All computation happens on the user’s machine, ensuring data privacy. We demonstrate the utility of ProtSpace through a progressive zoom-in across biological scales: from global proteome organization of Swiss-Prot, through cross-species comparison revealing conserved and lineage-specific families, to functional hypothesis generation within the beta-lactamase superfamily. ProtSpace is freely available at https://protspace.app under the Apache 2.0 license.

K ey points

ProtSpace is a free, open-source web application that visualizes protein Language Model (pLM) embeddings as interactive maps, scaling to 570,000 proteins entirely client-side.

A zero-installation Google Colab notebook and a Python CLI prepare visualization-ready bundles from FASTA files, UniProt queries, or pre-computed HDF5 embeddings, automatically retrieving 38 annotation types from five sources (UniProt, InterPro, NCBI Taxonomy, TED structural domains, and Biocentral sequence predictors) alongside custom CSV metadata.

Application examples demonstrate that embedding visualizations generate testable biological hypotheses at multiple scales, from proteome-wide organization through species-level comparison to family-level functional discovery, and that these are complementary to traditional sequence-based analyses.

Version published to 10.64898/2026.05.04.722720 on bioRxiv
May 7, 2026

Beyond profiles: supervised repeat annotation using protein embeddings

This article has 4 authors:
1. Kaiyu Qiu
2. Jan Ludwiczak
3. Andrei N. Lupas
4. Stanislaw Dunin-Horkawicz
This article has no evaluationsLatest version May 20, 2026
LIVIA: a browser-based tool for assessing and visualizing predicted protein interactions

This article has 2 authors:
1. Ah-Ram Kim
2. Norbert Perrimon
This article has no evaluationsLatest version May 10, 2026
ANYI: The ANnotated Yeast Interactome

This article has 8 authors:
1. Daniel A. Nissley
2. Muskan Goel
3. Xavier Castellanos-Girouard
4. Charles P. Kuntz
5. Yiqing Wang
6. M. Shahid Mukhtar
7. Adrian Serohijos
8. Jonathan P. Schlebach
This article has no evaluationsLatest version May 5, 2026

Discuss this preprint

Listed in

Abstract

K ey points

Article activity feed

Related articles

Beyond profiles: supervised repeat annotation using protein embeddings

LIVIA: a browser-based tool for assessing and visualizing predicted protein interactions

ANYI: The ANnotated Yeast Interactome