CatIF-RL: Activity-Oriented Enzyme Sequence Design by Steered Inverse Protein Folding
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein inverse folding models are designed to generate amino acid sequences compatible with a given backbone structure, but they are not explicitly optimized for specific biological functions. Here, we present CatIF-RL, a framework that steers a graph-based denoising diffusion inverse folding model toward designing enzyme variants with enhanced catalytic activity. CatIF-RL first adapts the inverse folding model to enzyme structural data, then introduces activity-oriented preference signals using predicted catalytic constant ( k cat ) as the optimization objective, enabling specialization through generative dataset curation and group-relative policy optimization (GRPO). This process iteratively shifts the sequence distribution toward higher predicted k cat while constraining sequence divergence to sequences that remain compatible with the input structure. On the independent benchmark, CatIF-RL achieves an approximately four-fold increase in predicted k cat relative to native enzymes, substantially outperforming representative inverse folding methods, while maintaining sequence recovery (0.55) and structural fidelity, and supporting motif-preserving partial sequence design. CatIF-RL establishes a practical framework for activity-oriented enzyme design and provides a generalizable strategy for steering structure-conditioned protein generation toward functional optimization.