Controllable All-Atom Protein Generation with Latent Diffusion
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose:
Designing proteins with atomic-level functional control remains a central challenge in de novo design, exacerbated by limitations in structural data availability.
Methods:
We introduce PLAID (Protein Latent Induced Diffusion), a generative model that efficiently co-generates discrete sequence and all-atom structure by sampling directly in the shared sequence–structure latent of a pretrained sequence-to-structure predictor. Unlike existing de novo generative models, PLAID trains its diffusion model exclusively on sequences, expanding effective training corpora by 2–4 orders of magnitude relative to structural databases. Using classifier-free guidance, PLAID supports controllable generation based on function (Gene Ontology) and organism keywords.
Results:
In silico , PLAID can unconditionally generate all-atom structures without explicit structural supervision during diffusion model training. Function conditioned proteins can recapitulates catalytic side-chain positions for residues at non-adjacent positions, and transmembrane proteins with expected hydrophobicity patterns and predicted topologies. We further experimentally validate that PLAID can be prompted to generate heme binding proteins with high sequence novelty.
Conclusion:
Overall, PLAID unifies sequence-scale training with atomic-level generation, enabling more precise functional control in protein design. Code and weights: github.com/amyxlu/plaid.