Efficient Protein Engineering via Integrated Language Models and Bayesian Optimization

Joshua Meehl
Prasad Siddavatam

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study investigates the application of advanced predictive models to reduce the cost and effort associated with protein engineering campaigns. We explore the use of protein language models (PLMs), a variant of large language models (LLMs), to predict functional performance from protein sequences. A common challenge in this domain is the scarcity of functional data. To address this, we examine zero-shot and few-shot learning methods. Another challenge is efficiently searching the vast fitness landscape for superior protein variants. We evaluate search methods, such as Bayesian optimization, to tackle this problem. The proposed methods are evaluated against a benchmark of 34 protein datasets containing sequences and their quantified functional values. Our findings demonstrate the potential of these advanced predictive models to streamline and accelerate the protein engineering process.

Version published to 10.1101/2025.09.30.679490 on bioRxiv
Oct 2, 2025

A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Bayesian Optimization for Biochemical Discovery with LLMs

This article has 6 authors:
1. Rafael Gómez-Bombarelli
2. Mattias Akke
3. Soojung Yang
4. Jurgis Ruza
5. Jinyeop Song
6. Elton Pan
This article has no evaluationsLatest version Jan 22, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Survey on Efficient Protein Language Models

Bayesian Optimization for Biochemical Discovery with LLMs

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction