ProteinReasoner: A Multi-Modal Protein Language Model with Chain-of-Thought Reasoning for Efficient Protein Design
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein language models (PLMs) have advanced the understanding and engineering of proteins by learning rich representations from large-scale sequence data. However, sequence-only models are limited in their ability to capture structural and evolutionary constraints essential for protein tasks. Although recent multi-modal PLMs integrate sequence and structure, they often fail to explicitly model the stepwise reasoning processes fundamental to protein science, particularly the evolutionary constraints and decision-making logic critical for protein design and optimization. Here, we introduce ProteinReasoner, a multi-modal protein language model that explicitly incorporates the “evolutionary profile” as the intermediate reasoning step between structure and sequence modalities within a chain-of-thought (CoT) framework. We demonstrated that ProteinReasoner achieved improved zero-shot performance in structure prediction, inverse protein folding, and fitness prediction tasks, consistently outperforming larger baselines including ESM3 and DPLM-2. Furthermore, we developed a novel In-context learning (ICL) paradigm for protein optimization that leverages ProteinReasoner’s reasoning capabilities to guide sequence generation based on prior experimental feedback. ProteinReasoner outperformed the conventional active learning paradigm in protein optimization tasks, achieving higher predictive accuracy and better generalization. ProteinReasoner offers a scalable, efficient, and generalizable framework for protein modeling and optimization, providing a practical path to accelerate protein engineering workflows and enhance mechanistic understanding of protein biology.