CatIF-RL: Activity-Oriented Enzyme Sequence Design by Steered Inverse Protein Folding

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Protein inverse folding models are designed to generate amino acid sequences compatible with a given backbone structure, but they are not explicitly optimized for specific biological functions. Here, we present CatIF-RL, a framework that steers a graph-based denoising diffusion inverse folding model toward designing enzyme variants with enhanced catalytic activity. CatIF-RL first adapts the inverse folding model to enzyme structural data, then introduces activity-oriented preference signals using predicted catalytic constant ( k cat ) as the optimization objective, enabling specialization through generative dataset curation and group-relative policy optimization (GRPO). This process iteratively shifts the sequence distribution toward higher predicted k cat while constraining sequence divergence to sequences that remain compatible with the input structure. On the independent benchmark, CatIF-RL achieves an approximately four-fold increase in predicted k cat relative to native enzymes, substantially outperforming representative inverse folding methods, while maintaining sequence recovery (0.55) and structural fidelity, and supporting motif-preserving partial sequence design. CatIF-RL establishes a practical framework for activity-oriented enzyme design and provides a generalizable strategy for steering structure-conditioned protein generation toward functional optimization.

Article activity feed