Connecting Syncmers to FracMinHash: similarities and advantages

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Sketching methods provide scalable solutions for analyzing rapidly growing genomic data. A recent innovation in sketching methods, syncmers, has proven effective and has been employed for read alignment. Syncmers share fundamental features with the FracMinHash technique, a recent modification of the popular MinHash algorithm for set similarity estimation between sets of different sizes. Although previous researchers have demonstrated the effectiveness of syncmers in read alignment, their potential for broader usages in metagenomic analysis (the primary purpose for which FracMinHash was designed) and sequence comparisons remains underexplored.

Results

We demonstrated that a open syncmer sketch is equivalent to a FracMinHash sketch when appled to k -mer-based similarities, yet it exhibits superior distance distribution and genomic conservation. Moreover, we expanded the concept of k -mer truncation to open syncmers, creating multi-resolution open syncmers for metagenomic applications as well as flexible-sized seeding for sequence comparisons.

Reproducibility

All analysis scripts can be found on GitHub .

Article activity feed