FusionProt: Fusing Sequence and Structural Information for Unified Protein Representation Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate protein representation is vital for diverse biological and biomedical applications. While three-dimensional (3D) structural context is central to protein function, most computational approaches either ignore it or fuse it with sequence information in a single late step, yielding limited benefits. We present FusionProt, a unified representation learning framework that iteratively exchanges information between a protein language model and a graph-based structure encoder via a single learnable fusion token. This early, bidirectional conditioning preserves structural cues across layers while maintaining near-constant complexity. Across EC and GO benchmarks, FusionProt achieves state-of-the-art results, improving F max by up to 3% over strong joint baselines; on mutation stability prediction it boosts AUROC by 5.1% versus the best structure model, with only 2–5% runtime overhead. We further demonstrate how analysing the specific gains in predictive capability can help focus attention and generate hypotheses about the underlying biological mechanisms.

Article activity feed