Fine-tuning of conditional Transformers for the generation of functionally characterized enzymes

Marco Nicolini
Emanuele Saitto
Ruben Emilio Jimenez Franco
Emanuele Cavalleri
Marco Mesiti
Aldo Javier Galeano Alfonso
Dario Malchiodi
Alberto Paccanaro
Peter N. Robinson
Elena Casiraghi
Giorgio Valentini

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We introduce Finenzyme , a Protein Language Model (PLM) that employs a multifaceted learning strategy based on transfer learning from a decoder-based Transformer, conditional learning using specific functional keywords, and fine-tuning to model specific Enzyme Commission (EC) categories. Using Finenzyme , we investigate the conditions under which fine-tuning enhances the prediction and generation of EC categories, showing a two-fold perplexity improvement in EC-specific categories compared to a generalist model. Our extensive experimentation shows that Finenzyme generated sequences can be very different from natural ones while retaining similar tertiary structures, functions and chemical kinetics of their natural counterparts. Importantly, the embedded representations of the generated enzymes closely resemble those of natural ones, thus making them suitable for downstream tasks. Finally, we illustrate how Finenzyme can be used in practice to generate enzymes characterized by specific functions using in-silico directed evolution, a computationally inexpensive PLM fine-tuning procedure significantly enhancing and assisting targeted enzyme engineering tasks.

Version published to 10.1101/2024.08.10.607430 on bioRxiv
Aug 10, 2024

Function-Driven Molecular Design Enabled by Instruction-Tuned Large Language Models

This article has 12 authors:
1. Qianfan Yang
2. Xurui Wang
3. Yanxi Wang
4. Ruizhao Zhu
5. Hailiang Li
6. Xinghong Wu
7. Xinyi Zhang
8. Mingyuan Zhou
9. Huaiwen Pu
10. Kaicong Cai
11. Yanan Tang
12. Feng Li
This article has no evaluationsLatest version Feb 12, 2026
Constructing the ensemble of representative structures for a protein via neural-surrogate-guided MSA recombination

This article has 4 authors:
1. Haipeng Gong
2. Hanyang Zhou
3. Hongyu Yu
4. Stephen Yau
This article has no evaluationsLatest version Mar 5, 2026
GENERator: A Long-Context Generative Genomic Foundation Model

This article has 18 authors:
1. Qiuyi Li
2. Wei Wu
3. Yuanyuan Zhang
4. Zhihao Zhan
5. Ruipu Chen
6. Mingyang Li
7. Kun Fu
8. Junyan Qi
9. Yongzhou Bao
10. Chao Wang
11. Yiheng Zhu
12. Zhiyun Zhang
13. Jian Tang
14. Fuli Feng
15. Jieping Ye
16. Liu Yuwen
17. Hui Xiong
18. Zheng Wang
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Function-Driven Molecular Design Enabled by Instruction-Tuned Large Language Models

Constructing the ensemble of representative structures for a protein via neural-surrogate-guided MSA recombination

GENERator: A Long-Context Generative Genomic Foundation Model