Fast and flexible minimizer digestion with digest
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Minimizer digestion is an increasingly common component of bioinformatics tools, including tools for De Bruijn-Graph assembly and sequence classification. We describe a new open source tool and library to facilitate efficient digestion of genomic sequences. It can produce digests based on the related ideas of minimizers, modimizers or syncmers. Digest uses efficient data structures, scales well to many threads, and produces digests with expected spacings between digested elements. Digest is implemented in C++17 with a Python API, and is available open-source at https://github.com/VeryAmazed/digest .