Protein Language Model Supervised Scalable Approach for Diverse and Designable Protein Motif-Scaffolding with GPDL

Haifeng Chen
Bo Zhang
kexin Liu
Zhuoqi Zheng
Junjie Zhu
Zhengxin Li
YUnfeiyang Liu
Junxi Mu
Ting Wei

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Proteins perform essential roles in numerous biological processes, largely driven by the three-dimensional structure of several key motif residues. Recently, a variety of energy-based and machine learning backbone generation methods have been developed to solve the motif-scaffolding task. However, it is still challenging to generate diverse and accurate scaffold structures around motifs for models either fine-tuned pre-trained multiple sequence alignment-based (MSA-based) structure prediction models or trained from scratch. Here, we introduced Generative Protein Design by Language model (GPDL) for effectively replacing traditional MSA-based pretraining. Using our scalable design strategy, GPDL successfully solved 22 out of 24 benchmark problems and outperformed other methods by generating 33.5% more unique designable clusters than RFdiffusion. This demonstrates that our approach can generate accurate and physically plausible structures across diverse protein design scenarios. GPDL also showed strong robustness in orphan proteins that have low sequence similarity with the training set. Our approach underscores the promise of protein language models in protein design and has the potential to accelerate the discovery of novel functional proteins for a wide range of biological and therapeutic applications.

Version published to 10.21203/rs.3.rs-5450034/v1 on Research Square
Dec 4, 2024

A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

This article has 4 authors:
1. Tayyip Topuz
2. Zeki Erdem
3. Halil Bisgin
4. E. Demet Akten
This article has no evaluationsLatest version Feb 2, 2026
Quantum-Assisted Refinement of AlphaFold Protein Structures

This article has 1 author:
1. Parham Ghayour
This article has no evaluationsLatest version Dec 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Survey on Efficient Protein Language Models

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

Quantum-Assisted Refinement of AlphaFold Protein Structures