Multi-Modal Protein Representation Learning with CLASP

Nicolas Bolouri
Joseph Szymborski
Amin Emad

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Effectively integrating data modalities pertaining to proteins’ amino acid sequences, three-dimensional structures, and curated biological descriptions can lead to informative representations capturing different views of proteins. Here, we introduce CLASP, a unified tri-modal framework that combines the strengths of geometric deep learning, natural large language models (LLMs), protein LLMs, and contrastive learning to learn informative protein representations based on their structure, amino acid sequence, and description. We show that CLASP enables accurate zero-shot classification and retrieval tasks, such as matching a protein structure to its sequence or description, outperforming state-of-the-art baselines. CLASP embeddings also exhibit superior clustering by protein family, and ablation studies confirm that all three modalities contribute synergistically to performance. Our results highlight the power of integrating structural, sequential, and textual signals in a single model, establishing CLASP as a general-purpose embedding framework for protein understanding.

Version published to 10.1101/2025.08.10.669533 on bioRxiv
Aug 12, 2025

FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning

This article has 3 authors:
1. Dan Kalifa
2. Uriel Singer
3. Kira Radinsky
This article has no evaluationsLatest version Aug 8, 2025
Data augmentation enables label-specific generation of homologous protein sequences

This article has 3 authors:
1. Lorenzo Rosset
2. Martin Weigt
3. Francesco Zamponi
This article has no evaluationsLatest version Jul 25, 2025
Protein Function Prediction via Contig-Aware Multi-Level Feature Integration

This article has 12 authors:
1. Liang Yang
2. Kaixin Du
3. Yintong Lu
4. Mengzhu Wang
5. Hongxing Zhang
6. Shuo Yang
7. Yu Lin
8. Jiaming Zhuo
9. Duoyue Zhang
10. Yiqi Jiang
11. Xianglilan Zhang
12. Shuaicheng Li
This article has no evaluationsLatest version Aug 11, 2025

Listed in

Abstract

Article activity feed

Related articles

FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning

Data augmentation enables label-specific generation of homologous protein sequences

Protein Function Prediction via Contig-Aware Multi-Level Feature Integration