FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning

Dan Kalifa
Uriel Singer
Kira Radinsky

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate protein representations that integrate sequence and three-dimensional (3D) struc-ture are critical to many biological and biomedical tasks. Most existing models either ignore structure or combine it with sequence through a single, static fusion step. Here we present FusionProt, a unified model that learns representations via iterative, bidirectional fusion be-tween a protein language model and a structure encoder. A single learnable token serves as a carrier, alternating between sequence attention and spatial message passing across layers. FusionProt is evaluated on Enzyme Commission (EC), Gene Ontology (GO), and mutation stability prediction tasks. It improves F _max by a median of +1.3 points (up to +2.0) across EC and GO benchmarks, and boosts AUROC by +3.6 points over the strongest baseline on mutation stability. Inference cost remains practical, with only ∼ 2–5% runtime over-head. Beyond state-of-the-art performance, we further demonstrate FusionProt’s practical relevance through representative biological case studies, suggesting that the model captures biologically relevant features.

Version published to 10.1101/2025.08.06.668973 on bioRxiv
Aug 8, 2025

A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
In-Context Learning in Genomic Language Models as a Biological Evaluation Task

This article has 2 authors:
1. Aadit Kapoor
2. Wendy Lee
This article has no evaluationsLatest version Dec 9, 2025
Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Survey on Efficient Protein Language Models

In-Context Learning in Genomic Language Models as a Biological Evaluation Task

Emergence of Biological Structural Discovery in General-Purpose Language Models