Contrastive learning unites sequence and structure in a global representation of protein space

Guy Yanai
Gabriel Axel
Liam M. Longo
Nir Ben-Tal
Rachel Kolodny

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Amino acid sequence dictates the three-dimensional structure and biological function of proteins. Yet, despite decades of research, our understanding of the interplay between sequence and structure is incomplete. To meet this challenge, we introduce Contrastive Learning Sequence-Structure (CLSS), an AI-based contrastive learning model trained to co-embed sequence and structure information in a self-supervised manner. We trained CLSS on large and diverse sets of protein building blocks called domains. CLSS represents both sequences and structures as vectors in the same high-dimensional space, where distance relates to sequence-structure similarity. Thus, CLSS provides a natural way to represent the protein universe, reflecting evolutionary relationships, as well as structural changes. We find that CLSS refines expert knowledge about the global organization of protein space, and highlights transitional forms that resist hierarchical classification. CLSS reveals linkage between domains of seemingly separate lineages, thereby significantly improving our understanding of evolutionary design.

Version published to 10.1101/2025.09.05.674454 on bioRxiv
Sep 8, 2025

The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026
Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

This article has 7 authors:
1. Valentina Carbonari
2. Annamaria Defilippo
3. Ugo Lomoio
4. Caterina Francesca Perri
5. Barbara Puccio
6. Pierangelo Veltri
7. Pietro Hiram Guzzi
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Evolution of the AlphaFold Architecture

Emergence of Biological Structural Discovery in General-Purpose Language Models

Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome