cONcat: Computational reconstruction of concatenated fragments from long Oxford Nanopore reads
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Synthetic combinatorial DNA libraries are widely used to produce protein variants, optimize binders, and for high throughput studies of protein - DNA interactions. The libraries can be made by researchers or vendors and high-throughput sequencing is used for both quality control and to study the outcome of selection experiments. Oxford nanopore sequencing (ONT) is well suited to this as it allows for long read lengths and can be done rapidly with low-cost instrumentation. However, it suffers from a lower overall read accuracy and an uneven error profile. No current bioinformatics tools are well suited to the challenge of deducing the composition and order of constituent members of combinatorial libraries from ONT reads.
We introduce cONcat, an algorithm to identify the makeup of concatenated DNA fragments in a set of ONT sequencing reads from a pool of known fragments. cONcat uses the edit distance-based recursive covering algorithm for finding the best possible matchings between the fragments and the reads. In our experiments on simulated and experimental data, cONcat could accurately detect the correct fragment coverings given the short fragment sizes (< 20bp) and the sequencing errors present in ONT reads. However, we find that the high error rates in the start of ONT reads make it challenging to get confident coverage there, inferring a need for experimental strategies to avoid key sequence information in the start of reads.