FoldToken4: Consistent & Hierarchical Fold Language
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Creating protein structure language has attracted increasing attention in unifing the modality of protein sequence and structure. While recent works, such as FoldToken1&2&3 have made great progress in this direction, the relationship between languages created by different models at different scales is still unclear. Moreover, models at multiple scales (different code space size, like 2 5 , 2 6 , ⋯, 2 12 ) need to be trained separately, leading to redundant efforts. We raise the question: Could a single model create multiscale fold languages? In this paper, we propose FoldToken4 to learn the consistent and hierarchical of multiscale fold languages. By introducing multiscale code adapters and token mixing techniques, FoldToken4 can generate multiscale languages from the same model, and discover the hierarchical token-mapping relationships across scales. To the best of our knowledge, FoldToken4 is the first effort to learn multi-scale token consistency and hierarchy in VQ research; Also, it should be more novel in protein structure language learning.