All-Atom Protein Generation with Latent Diffusion

Amy X. Lu
Wilson Yan
Sarah A. Robinson
Simon Kelow
Kevin K. Yang
Vladimir Gligorijevic
Kyunghyun Cho
Richard Bonneau
Pieter Abbeel
Nathan C. Frey

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

While generative models hold immense promise for protein design, existing models are typically backbone-only, despite the indispensable role that sidechain atoms play in mediating function. As prerequisite knowledge, all-atom 3D structure generation require the discrete sequence to specify sidechain identities, which poses a multimodal generation problem. We propose PLAID ( P rotein La tent Induced D iffusion), which samples from the latent space of a pre-trained sequence-to-structure predictor, ESMFold. The sampled latent embedding is then decoded with frozen decoders into the sequence and all-atom structure. Importantly, PLAID only requires sequence input during training , thus augmenting the dataset size by 2-4 orders of magnitude compared to the Protein Data Bank. It also makes more annotations available for functional control. As a demonstration of annotation-based prompting, we perform compositional conditioning on function and taxonomy using classifier-free guidance. Intriguingly, function-conditioned generations learn active site residue identities, despite them being non-adjacent on the sequence, and can correctly place the sidechains atoms. We further show that PLAID can generate transmembrane proteins with expected hydrophobicity patterns, perform motif scaffolding, and improve unconditional sample quality for long sequences. Links to model weights and training code are publicly available at github.com/amyxlu/plaid.

Version published to 10.1101/2024.12.02.626353v2 on bioRxiv
Feb 13, 2025
Version published to 10.1101/2024.12.02.626353v1 on bioRxiv
Dec 5, 2024

ProtoBind-Diff: A Structure-Free Diffusion Language Model for Protein Sequence-Conditioned Ligand Design

This article has 4 authors:
1. Lukia Mistryukova
2. Vladimir Manuilov
3. Konstantin Avchaciov
4. Peter O. Fedichev
This article has no evaluationsLatest version Jul 10, 2025
DPAC: Prediction and Design of Protein-DNA Interactions via Sequence-Based Contrastive Learning

This article has 4 authors:
1. Leo Tianlai Chen
2. Rishab Pulugurta
3. Pranay Vure
4. Pranam Chatterjee
This article has no evaluationsLatest version May 19, 2025
Rapid and accurate protein structure database search using inverse folding model and contrastive learning

This article has 5 authors:
1. Qiuyi Lyu
2. Hong Wei
3. Shuaishuai Chen
4. Zhenling Peng
5. Jianyi Yang
This article has no evaluationsLatest version May 20, 2025

Listed in

Abstract

Article activity feed

Related articles

ProtoBind-Diff: A Structure-Free Diffusion Language Model for Protein Sequence-Conditioned Ligand Design

DPAC: Prediction and Design of Protein-DNA Interactions via Sequence-Based Contrastive Learning

Rapid and accurate protein structure database search using inverse folding model and contrastive learning