RareFold: Structure prediction and design of proteins with noncanonical amino acids

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Protein structure prediction and design have traditionally been limited to the 20 canonical amino acids. Expanding this space to include noncanonical amino acids (NCAAs) offers new opportunities for probing novel interactions and engineering proteins with enhanced or entirely new functions. Some NCAAs also offer practical advantages, such as increased proteolytic stability and reduced immunogenicity, as they are rarely encountered by the human immune system. Here, we present RareFold , a deep learning model capable of accurate structure prediction for proteins containing both the 20 canonical amino acids and an additional 29 NCAAs. By treating each amino acid as a distinct token, RareFold learns residue-specific atomic interaction patterns, enabling precise modelling of chemically diverse sequences. This tokenised representation also supports sequence-structure co-optimisation, allowing efficient inverse design. We leverage this capability in EvoBindRare , a design framework for generating linear and cyclic peptide binders that incorporate NCAAs. Applying EvoBindRare, we design binders targeting a ribonuclease and experimentally validate these, obtaining μM affinity in both the linear and cyclic cases. RareFold thus enables binder design with an expanded chemical vocabulary, opening the door to next-generation peptide therapeutics with improved stability, specificity, and immune evasion. RareFold is available at: https://github.com/patrickbryant1/RareFold

Article activity feed