OligoN-design: A simple and versatile tool to design specific probes and primers from large heterogeneous datasets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
High-throughput environmental DNA sequencing has ushered ecological and evolutionary studies into the big data era. With thousands to millions of DNA sequences, designing taxon-specific oligonucleotides is a current bottleneck of molecular studies that rely on primers for Polymerase Chain Reactions (PCRs) or probes for Fluorescence in situ Hybridization (FISH). No software currently exists to design specific oligonucleotides starting from a custom set of sequences. Existing tools rely on specific databases, alignments or phylogenetic trees, or cannot accommodate increasingly large molecular environmental datasets. Here we present oligoN-design, a versatile tool to design oligonucleotides specific to a set of target sequences while minimizing predicted binding to non-target sequences. OligoN-design is simple, reproducible, and adaptable to high-throughput sequencing data analyses. It requires only two fasta files as input, one containing target taxa and the other containing non-target taxa. Using standard bioinformatic formats, it integrates easily with other tools such as BLAST, VSEARCH or MAFFT. OligoN-design allows a range of strategies that we present in detail, from an unsupervised end-to-end usage all the way to a detailed and thorough expert usage. Starting with large, comprehensive ribosomal databases that are widely used by the community (i.e., PR2, SILVA) and the unsupervised function, we were able to replicate known taxa-specific oligonucleotides in under 30 minutes and up to 6 Gb of RAM on a personal laptop. OligoN-design v1, available at github.com/MiguelMSandin/oligoN-design under GNU General Public License version 3.0, is easily installed via bioconda bioconda.github.io/recipes/oligon-design/README.html .