Design of peptides with non-canonical amino acids using flow matching
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The canonical vocabulary of twenty amino acids limits the chemical space available to proteins and peptides. Expanding this vocabulary to hundreds of non-canonical amino acids allows the engineering of proteins with novel function and activity, and is of great interest for the discovery of novel drugs such as macrocyclic peptides. Here we present NCFlow, a flow-based generative model capable of incorporating any arbitrary non-canonical amino acid into a given protein. To supplement sparse training data in the Protein Data Bank, NCFlow is pretrained on millions of small molecule structures and a large set of protein-ligand complexes before finetuning on native non-canonicals found within proteins in the Protein Data Bank. We show that NCFlow outperforms AlphaFold3-based methods in the structure prediction of unseen non-canonical amino acids. We present a peptide design pipeline akin to in silico deep mutational scanning, and propose a novel scoring strategy using a combination of deep learning-based and molecular dynamics-based alchemical binding free energy calculations to identify improved peptide variants. We apply the method on four protein-peptide complex test cases, and observe that incorporating non-canonicals can significantly improve binding affinity by up to -7.0 kcal/mol. Thus, NCFlow can be easily integrated into existing protein design platforms to further improve its properties outside of what is capable with standard amino acids.