RareFold: Structure prediction and design of proteins with noncanonical amino acids
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein structure prediction and design have traditionally been limited to the 20 canonical amino acids. Expanding this space to include noncanonical amino acids (NCAAs) offers new opportunities for probing novel interactions and engineering proteins with enhanced or entirely new functions. Here, we present RareFold, a deep learning model capable of accurate structure prediction for proteins containing both the 20 canonical amino acids and an additional 29 NCAAs. By treating each amino acid as a distinct token, RareFold learns residue-specific atomic interaction patterns, enabling precise modelling of chemically diverse sequences. This tokenised representation also supports sequence-structure co-optimisation, allowing efficient inverse design. We leverage this capability in EvoBindRare, a design framework for generating linear and cyclic peptide binders that incorporate NCAAs. Applying EvoBindRare, we design binders targeting a Ribonuclease and experimentally validate these. We obtain novel binders with both linear and cyclic topologies that harbour novel chemical interactions with the same affinity as wild-type binders. Immunogenicity profiling indicates that these designs do not exhibit increased immune activation relative to the wild-type sequence, supporting their potential suitability for in vivo applications. RareFold enables binder design with an expanded chemical vocabulary, opening the door to next-generation peptide therapeutics with both linear and cyclic topologies. RareFold is available at: https://github.com/patrickbryant1/RareFold