alpseq: an open-source workflow to turbocharge nanobody discovery with high-throughput sequencing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Nanobodies have emerged as promising tools for many biotechnological applications due to their small size, high stability, and remarkable binding specificity. Next-Generation Sequencing (NGS) enables deep profiling of large nanobody libraries and panning campaigns, however the scale and diversity of nanobody NGS datasets presents a significant bioinformatic challenge. To this end, we have developed alpseq , an optimised, open-source software pipeline designed specifically for the efficient and accurate processing of NGS data from nanobody libraries and panning campaigns. alpseq is also paired with a PCR-free sequencing library preparation protocol to allow researchers to easily generate their own data while avoiding biases. The alpseq software pipeline is composed of two parts: a pre-processing module written in Nextflow efficiently handles raw nanobody reads in a single line of code. These results are then fed into the analysis module, which contains a comprehensive suite of functions for quality control, diversity analysis, identification of enriched sequences and clustering. alpseq also creates a user-friendly interactive report which empowers scientists to explore their data without the need for extensive bioinformatic experience. Sophisticated panning campaign designs are supported, such as replicates and comparisons between different pans to find cross-binding leads. alpseq thus generates insights into the nanobody selection process and delivers a list of lead candidates for further experimental validation and downstream applications. alspeq is available at https://github.com/kzeglinski/alpseq .