NovoTax: prokaryotic strain identification from mass spectrometry-based proteomics data

Dennis Svedberg
André Mateus

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Summary

Traditional mass spectrometry-based proteomics typically requires prior knowledge of sample composition to match spectra to peptides. Yet, novel de novo peptide sequencing approaches can provide peptide sequences to identify the organism. Here, we introduce an end-to-end pipeline (NovoTax) to identify the closest prokaryotic proteome directly from raw bottom-up proteomics data. The approach combines peptide sequencing tools with an optimized implementation of peptide searching through an extensive proteome database. On a benchmark dataset of species isolates, we identified the reported species and strain in the majority of the cases, and showed that in discordant cases NovoTax was likely correct. Interestingly, NovoTax was also able to identify contaminating species in samples. The algorithm also identified the most abundant organisms in bacterial communities. In summary, NovoTax provides strain level identification of microbial samples enabling the downstream use of traditional proteomics search engines for a deeper proteome analysis.

Availability and implementation

The open-source software is available on GitHub at https://github.com/mateuslab-prot/NovoTax

Version published to 10.64898/2026.04.02.715787 on bioRxiv
Apr 6, 2026

De novo protein discovery in non-model organisms

This article has 1 author:
1. Asif Ali
This article has no evaluationsLatest version May 13, 2026
A novel method to select Reference Proteomes in UniProt

This article has 11 authors:
1. Pedro Raposo
2. Juan Sebastian Martinez Marin
3. Gyuri Kim
4. Giuseppe Insana
5. Dushyanth Jyothi
6. Jie Luo
7. Tanushree Tunstall
8. UniProt Consortium
9. Sandra Orchard
10. Martin Steinegger
11. Maria Martin
This article has no evaluationsLatest version May 14, 2026
Bridging genomes and peptidomes: hybrid sequencing reveals conserved bioactive peptides in crustaceans

This article has 8 authors:
1. Lauren Fields
2. Jiangrong Qin
3. Angel E. Ibarra
4. Kendra G. Selby
5. Tong Gao
6. Tina C. Dang
7. Haiyan Lu
8. Lingjun Li
This article has no evaluationsLatest version May 7, 2026

NovoTax: prokaryotic strain identification from mass spectrometry-based proteomics data

Discuss this preprint

Listed in

Abstract

Summary

Availability and implementation

Article activity feed

De novo protein discovery in non-model organisms

A novel method to select Reference Proteomes in UniProt

Bridging genomes and peptidomes: hybrid sequencing reveals conserved bioactive peptides in crustaceans

Discuss this preprint

Listed in

Abstract

Summary

Availability and implementation

Article activity feed

Related articles

De novo protein discovery in non-model organisms

A novel method to select Reference Proteomes in UniProt

Bridging genomes and peptidomes: hybrid sequencing reveals conserved bioactive peptides in crustaceans