Fine-tuning of conditional Transformers for the generation of functionally characterized enzymes

Giorgio Valentini
Marco Nicolini
Emanuele Saitto
Ruben Jimenez Franco
Emanuele Cavalleri
Marco Mesiti
Aldo Galeano
Dario Malchiodi
Alberto Paccanaro
Peter Robinson
Elena Casiraghi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We introduce Finenzyme, a Protein Language Model (PLM) that employs a multifaceted learning strategy based on transfer learning from a decoder-based Transformer, conditional learning using specific functional keywords, and fine-tuning to model specific Enzyme Commission (EC) categories. Using Finenzyme, we investigate the conditions under which fine-tuning enhances the prediction and generation of EC categories, showing a two-fold perplexity improvement in EC-specific categories compared to a generalist model. Our extensive experimentation shows that Finenzyme generated sequences can be very different from natural ones while retaining similar tertiary structures, functions and chemical kinetics of their natural counterparts. Importantly, the embedded representations of the generated enzymes closely resemble those of natural ones, thus making them suitable for downstream tasks. Finally, we illustrate how Finenzyme can be used in practice to generate enzymes characterized by specific functions using in-silico directed evolution, a computationally inexpensive PLM fine-tuning procedure significantly enhancing and assisting targeted enzyme engineering tasks.

Version published to 10.21203/rs.3.rs-4894300/v1 on Research Square
Sep 10, 2024

Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026
Closed-Loop Workflow of High-Entropy Materials Discovery: Efficient and Accurate Synthesizability Prediction via Domain-Specific Local LLMs

This article has 3 authors:
1. Yeongjun Yoon
2. Geun Ho Gu
3. Kyeounghak Kim
This article has no evaluationsLatest version Dec 19, 2025
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Emergence of Biological Structural Discovery in General-Purpose Language Models

Closed-Loop Workflow of High-Entropy Materials Discovery: Efficient and Accurate Synthesizability Prediction via Domain-Specific Local LLMs

A Survey on Efficient Protein Language Models