FlexRibbon: Joint Sequence and Structure Pretraining for Protein Modeling

Jianwei Zhu
Yu Shi
Ran Bi
Peiran Jin
Chang Liu
Zhe Zhang
Haitao Huang
Zekun Guo
Pipi Hu
Fusong Ju
Lin Huang
Xinran Tai
Chenao Li
Kaiyuan Gao
Xinran Wei
Huanhuan Xia
Jia Zhang
Yaosen Min
Zun Wang
Yusong Wang
Liang He
Haiguang Liu
Tao Qin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein foundation models have advanced rapidly, with most approaches falling into two dominant paradigms. Sequence-only language models (e.g., ESM-2) capture sequence semantics at scale but lack structural grounding. MSA-based predictors (e.g., AlphaFold 2/3) achieve accurate folding by exploiting evolutionary couplings, but their reliance on homologous sequences makes them less reliable in highly mutated or alignment-sparse regimes. We present FlexRibbon, a pretrained protein model that jointly learns from amino acid sequences and three-dimensional structures. Our pretraining strategy combines masked language modeling with diffusion-based denoising, enabling bidirectional sequence-structure learning without requiring MSAs. Trained on both experimentally resolved structures and AlphaFold 2 predictions, FlexRibbon captures global folds as well as flexible conformations critical for biological function. Evaluated across diverse tasks spanning interface design, intermolecular interaction prediction, and protein function prediction, FlexRibbon establishes new state-of-the-art performance on 12 different tasks, with particularly strong gains in mutation-rich settings where MSA-based methods often struggle.

Version published to 10.1101/2025.10.08.681293 on bioRxiv
Oct 10, 2025

Pretrained protein language models choose between sequence novelty and structural completeness

This article has 3 authors:
1. Arjuna M. Subramanian
2. Zachary A. Martinez
3. Matt Thomson
This article has no evaluationsLatest version Oct 3, 2025
A Structure-Aware Generative Framework for Exploring Protein Sequence and Function Space

This article has 4 authors:
1. Divyanshu Shukla
2. Jonathan Martin
3. Faruck Morcos
4. Davit A. Potoyan
This article has no evaluationsLatest version Sep 19, 2025
ProStab: Prediction of protein stability change upon mutations by protein language and inverse folding models

This article has 11 authors:
1. Hong Tan
2. Xiaowei Wei
3. Shenggeng Lin
4. Xueying Mao
5. Junwei Chen
6. Heqi Sun
7. Yufang Zhang
8. Zhenghong Zhou
9. Dong-Qing Wei
10. Shuangjun Lin
11. Yi Xiong
This article has no evaluationsLatest version Aug 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Pretrained protein language models choose between sequence novelty and structural completeness

A Structure-Aware Generative Framework for Exploring Protein Sequence and Function Space

ProStab: Prediction of protein stability change upon mutations by protein language and inverse folding models