Leveraging protein representations to explore uncharted fold spaces with generative models

Yangyang Miao
Martin Pacesa
Sandrine Georgeon
Joseph Schmidt
Tianyu Lu
Po-Ssu Huang
Bruno Correia

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

A major challenge in computational de novo protein design is the exploration of uncharted areas within the protein structural space. However, the large degrees of freedom of protein backbones complicate the sampling process during protein design. Machine learning-based models have made great strides in this problem, however due to their nature they tend to exploit rather than explore the data distribution used for training the neural networks. To address some of these challenges, we propose a new coarse grained protein structure representation generative method, DiffTopo , a diffusion model which increases the sampling efficiency and diversity. Combined with a backbone level protein generative model like RFdiffusion, novel protein folds can be generated rapidly, allowing for efficient exploration of the designable topology space. Interestingly, we have discovered that by mirroring the topological organization of native proteins using a pipeline named MirrorTopo , we can readily expand the known fold space. We generated and experimentally characterized 30 different novel topologies from DiffTopo and 6 different novel mirror topologies from MirrorTopo. The developed framework relying on low resolution sampling provides new means for fold exploration challenges, which could in principle enhance our knowledge of the first principles of protein structure and folding, as well as create new opportunities for functional design.

Version published to 10.1101/2025.10.10.681606 on bioRxiv
Oct 10, 2025

A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Quantum-Assisted Refinement of AlphaFold Protein Structures

This article has 1 author:
1. Parham Ghayour
This article has no evaluationsLatest version Dec 31, 2025
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Survey on Efficient Protein Language Models

Quantum-Assisted Refinement of AlphaFold Protein Structures

The Evolution of the AlphaFold Architecture