MegaPX: fast and space-efficient peptide assignment method using IBF-based multi-indexing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

A central problem for metaproteomic analysis is the often-unknown taxonomic composition of the analyzed microbiomes. Using a database search, the standard approach requires prior knowledge of which proteins and taxa to include in the protein reference database or to use tailored metagenome-derived databases, which are expensive and error-prone in their generation. A possible strategy to circumvent this database search issue is de novo sequencing, where peptide sequences are directly identified from mass spectra. However, these sequences must still be mapped back to potentially extensive databases. Here, alignment-based approaches enable robust and precise results, with the potential drawback of high memory usage and long run times.

Results

We present MegaPX, a software for rapidly classifying de novo peptide sequences against large protein databases. MegaPX implemented as a C++-based tool, uses an alignment-free, k -mer-based approach as a taxonomic classification method with the possibility of generating mutated reference databases for error-tolerant searching. It uses various algorithms, including interleaved Bloom filters, to efficiently compute approximate membership queries, ensuring fast processing times while querying and indexing large databases in a multi-indexing fashion. We demonstrate the potential of MegaPX by analyzing different samples, including metaproteomics, against extensive reference databases, highlighting its use as a fast screening tool.

Availability and implementation

MegaPX’s source code and all related documentation files are freely available in a GitHub repository ( https://github.com/rki-mf2/MegaPX ) under the MIT License.

Contact

MuthT@rki.de

Supplementary information

All supplementary tables and figures are included at the end of the preprint.

Article activity feed