FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate protein representations that integrate sequence and three-dimensional (3D) struc-ture are critical to many biological and biomedical tasks. Most existing models either ignore structure or combine it with sequence through a single, static fusion step. Here we present FusionProt, a unified model that learns representations via iterative, bidirectional fusion be-tween a protein language model and a structure encoder. A single learnable token serves as a carrier, alternating between sequence attention and spatial message passing across layers. FusionProt is evaluated on Enzyme Commission (EC), Gene Ontology (GO), and mutation stability prediction tasks. It improves F max by a median of +1.3 points (up to +2.0) across EC and GO benchmarks, and boosts AUROC by +3.6 points over the strongest baseline on mutation stability. Inference cost remains practical, with only ∼ 2–5% runtime over-head. Beyond state-of-the-art performance, we further demonstrate FusionProt’s practical relevance through representative biological case studies, suggesting that the model captures biologically relevant features.