PARMIK: PArtial Read Matching with Inexpensive K-mers
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Environmental metagenomic sampling is instrumental in preparing for future pandemics by enabling early identification of potential pathogens and timely intervention strategies. Novel pathogens are a major concern, especially for zoonotic events. However, discovering novel pathogens often requires genome assembly, which remains a significant bottleneck. A robust metagenomic sampling that is directly searchable with new infection samples would give us a real-time understanding of outbreak origins dynamics. In this study, we propose PArtial Read Matching with Inexpensive K-mers (PARMIK), which is a search tool for efficiently identifying similar sequences from a patient sample (query) to a metagenomic sample (read). For example, at 90% identity between a query and a read, PARMIK surpassed BLAST, providing up to 21% higher recall. By filtering highly frequent k-mers, we reduced PARMIK's index size by over 50%. Moreover, PARMIK identified longer alignments faster than BLAST, peaking at 1.57x, when parallelizing across 32 cores.