Leveraging protein representations to explore uncharted fold spaces with generative models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A major challenge in computational de novo protein design is the exploration of uncharted areas within the protein structural space. However, the large degrees of freedom of protein backbones complicate the sampling process during protein design. Machine learning-based models have made great strides in this problem, however due to their nature they tend to exploit rather than explore the data distribution used for training the neural networks. To address some of these challenges, we propose a new coarse grained protein structure representation generative method, DiffTopo , a diffusion model which increases the sampling efficiency and diversity. Combined with a backbone level protein generative model like RFdiffusion, novel protein folds can be generated rapidly, allowing for efficient exploration of the designable topology space. Interestingly, we have discovered that by mirroring the topological organization of native proteins using a pipeline named MirrorTopo , we can readily expand the known fold space. We generated and experimentally characterized 30 different novel topologies from DiffTopo and 6 different novel mirror topologies from MirrorTopo. The developed framework relying on low resolution sampling provides new means for fold exploration challenges, which could in principle enhance our knowledge of the first principles of protein structure and folding, as well as create new opportunities for functional design.

Article activity feed