Examining Selection Dynamics and Limitations in Multi-round Protein Selection of High Diversity Libraries

John Z. Chen
Barnabas Gall
Tommy Y. Lu
Isabella Heslop
Daniel Hesselson
Christoph Nitsche
Wai-Hong Tham
Richard J. Payne
Colin J. Jackson

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Proteins and peptides underpin essential biological functions and technological applications, from targeting disease-relevant interactions to providing broad enzymatic activities. However, engineering molecules with desired properties remains difficult, owing to complex sequence-structure-function relationships and the lack of data on specific systems. Experimental selection strategies, including directed evolution, phage display, and mRNA display, address this challenge by leveraging high diversity libraries and iterative enrichment under defined selection pressures. This allows for the identification of candidates without requiring extensive prior knowledge, and can generate extensive datasets for use in machine learning. While many selection systems exist, comparisons across different selection approaches are hindered by the lack of a unifying analytical framework. Here, we present a set of broadly applicable analyses for assessing selection dynamics in multi-round or multi-condition experiments, ranging from position level analysis of sequence properties to full sequence space mappings through protein language model embeddings. Using the toolset to analyze a variety of different datasets in parallel, we explore the potential effects of diversity, coverage, and reproducibility, offering generalizable insights to guide experimental design, interpretation, and troubleshooting across protein and peptide discovery platforms.

Version published to 10.1101/2025.11.09.687419 on bioRxiv
Nov 9, 2025

A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
Comprehensive benchmarking of RNA velocity methods across single-cell datasets

This article has 6 authors:
1. Jin Liu
2. Yida Wu
3. Chuihan Kong
4. Xu Liao
5. Zhixiang Lin
6. Xiaobo Sun
This article has no evaluationsLatest version Feb 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Survey on Efficient Protein Language Models

The Evolution of the AlphaFold Architecture

Comprehensive benchmarking of RNA velocity methods across single-cell datasets