Fast and flexible minimizer digestion with digest

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Summary

Minimizer digestion is an increasingly common component of bioinformatics tools, including tools for de Bruijn graph assembly and sequence classification. We describe a new open source tool and library to facilitate efficient digestion of genomic sequences. It can produce digests based on the related ideas of minimizers, modimizers or syncmers. Digest uses efficient data structures, scales well to many threads, and produces digests with expected spacings between digested elements.

Availability and implementation

Digest is implemented in C++17 with a Python API, and is available open-source at https://github.com/VeryAmazed/digest. The python library is available on Bioconda. Rust bindings are available as a public crate at https://crates.io/crates/digest-rs.

Article activity feed