De novo peptide databases enable protein-based stable isotope probing of microbial communities with up to species-level resolution

Simon Klaes
Christian White
Lisa Alvarez-Cohen
Lorenz Adrian
Chang Ding

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Protein-based stable isotope probing (Protein-SIP) is a powerful approach that can directly link individual taxa to activity and substrate assimilation, elucidating metabolic pathways and trophic relationships within microbial communities. In Protein-SIP, peptides and corresponding taxa are identified by database matching, making database quality crucial for accurate analyses. For samples with unknown community composition, Protein-SIP typically employs either unrestricted reference databases or metagenome-derived databases. While (meta)genome-derived databases represent the gold standard, they may be incomplete and are typically resource-intensive to generate. In contrast, unrestricted reference databases can inflate the search space and require complex post-processing.

Results

Here, we explore the feasibility of using de novo peptide sequencing to construct peptide databases directly from mass spectrometry raw data. We then use the mass spectrometric data from labeled cultures to quantify isotope incorporation into specific peptides. We benchmark our approach against the canonical approach in which a sample-matching (meta)genome-derived protein sequence database is used on three different datasets: 1) a proteome analysis from a defined microbial community containing ¹³ C-labeled E. coli cells, 2) time-course data of an anammox-dominated continuous reactor after feeding with ¹³ C-labeled bicarbonate, and 3) a model of the human distal gut simulating a high-protein and high-fiber diet cultivated in either ² H2O or H2 ¹⁸ O. Our results show that de novo peptide databases are applicable to different isotopes, detecting similar amounts of labeled peptides compared to sample-matching (meta)genome-derived databases, and also identify labeled peptides missed by this canonical approach. Furthermore, we show that peptide-centric Protein-SIP allows up to species-specific resolution and enables the assessment of activity related to individual biological processes. Finally, we provide access to our modular Python pipeline to assist the construction of de novo peptide databases and subsequent peptide-centric Protein-SIP data analysis ( https://git.ufz.de/meb/denovo-sip ).

Conclusions

De novo peptide databases enable Protein-SIP of microbial communities without prior knowledge of the composition and can be used complementarily to (meta)genome-derived databases or as a standalone alternative in exploratory or resource-limited settings.

Version published to 10.1101/2024.11.25.625156 on bioRxiv
Nov 26, 2024

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
NAP: An Open-Source Pipeline for Cross-Domain Microbiome Profiling Using Nanopore Sequencing-Derived Amplicon Data

This article has 2 authors:
1. Luke B. Jones
2. Stefan Bagby
This article has no evaluationsLatest version Mar 10, 2026
Species extinction risk is shaped by proteome-level amino acid composition and selective codon usage

This article has 3 authors:
1. Smarajit Maiti
2. Anindya Sundar Panja
3. Tanmoy Mondal
This article has no evaluationsLatest version Feb 13, 2026

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

Related articles

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

NAP: An Open-Source Pipeline for Cross-Domain Microbiome Profiling Using Nanopore Sequencing-Derived Amplicon Data

Species extinction risk is shaped by proteome-level amino acid composition and selective codon usage