cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads

Alexander J. Petri
Mai Thi-Huyen Nguyen
Anjali Rajwar
Erik Benson
Kristoffer Sahlin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Synthetic combinatorial DNA libraries are widely used to produce protein variants, optimize binders, and for high-throughput studies of protein-DNA interactions. The libraries can be made by researchers or vendors, and high-throughput sequencing is used for both quality control and to study the outcome of selection experiments. Oxford nanopore sequencing (ONT) is well suited to this as it allows for long read lengths and can be done rapidly with low-cost instrumentation. However, it suffers from a lower overall read accuracy and an uneven error profile. No current bioinformatics tools are well-suited to the challenge of deducing the composition and order of constituent members of combinatorial libraries from ONT reads. We introduce cONcat, an algorithm to identify the makeup of concatenated DNA fragments in a set of ONT sequencing reads from a pool of known fragments. cONcat uses an edit distance-based recursive covering algorithm for finding the best possible matchings between the fragments and the reads. In our experiments on simulated and experimental data, cONcat accurately detects the correct fragment coverings given the short fragment sizes (< 20 bp) and the sequencing errors present in ONT reads. However, we find that the high error rates in the start of ONT reads make it challenging to get confident coverage there, inferring a need for experimental strategies to avoid key sequence information in the start of reads.

Version published to 10.1371/journal.pone.0321246
Jul 24, 2025
Version published to 10.1101/2025.03.05.641699 on bioRxiv
Mar 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed