FoldToken4: Consistent & Hierarchical Fold Language

Zhangyang Gao
Cheng Tan
Stan Z. Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Creating protein structure language has attracted increasing attention in unifing the modality of protein sequence and structure. While recent works, such as FoldToken1&2&3 have made great progress in this direction, the relationship between languages created by different models at different scales is still unclear. Moreover, models at multiple scales (different code space size, like 2 ⁵ , 2 ⁶ , ⋯, 2 ¹² ) need to be trained separately, leading to redundant efforts. We raise the question: Could a single model create multiscale fold languages? In this paper, we propose FoldToken4 to learn the consistent and hierarchical of multiscale fold languages. By introducing multiscale code adapters and token mixing techniques, FoldToken4 can generate multiscale languages from the same model, and discover the hierarchical token-mapping relationships across scales. To the best of our knowledge, FoldToken4 is the first effort to learn multi-scale token consistency and hierarchy in VQ research; Also, it should be more novel in protein structure language learning.

Version published to 10.1101/2024.08.04.606514 on bioRxiv
Aug 4, 2024

RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models

This article has 2 authors:
1. Zinnia Ma
2. Neville P. Bethel
This article has no evaluationsLatest version Sep 23, 2025
PEPE: Scalable extraction of multi-modal protein language model representations

This article has 6 authors:
1. Jahn Zhong
2. Niccolò Cardente
3. Geir Kjetil Sandve
4. Habib Bashour
5. Maria Francesca Abbate
6. Victor Greiff
This article has no evaluationsLatest version Oct 14, 2025
FlexRibbon: Joint Sequence and Structure Pretraining for Protein Modeling

This article has 23 authors:
1. Jianwei Zhu
2. Yu Shi
3. Ran Bi
4. Peiran Jin
5. Chang Liu
6. Zhe Zhang
7. Haitao Huang
8. Zekun Guo
9. Pipi Hu
10. Fusong Ju
11. Lin Huang
12. Xinwei Tai
13. Chenao Li
14. Kaiyuan Gao
15. Xinran Wei
16. Huanhuan Xia
17. Jia Zhang
18. Yaosen Min
19. Zun Wang
20. Yusong Wang
21. Liang He
22. Haiguang Liu
23. Tao Qin
This article has no evaluationsLatest version Oct 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

RemoteFoldSet: Benchmarking Structural Awareness of Protein Language Models

PEPE: Scalable extraction of multi-modal protein language model representations

FlexRibbon: Joint Sequence and Structure Pretraining for Protein Modeling