Zero-Shot De Novo Peptide Sequencing with Open Post-Translational Modification Discovery
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Proteins play essential roles in biology, yet identifying their precise sequences and modifications remains challenging. De novo peptide sequencing offers a powerful solution by directly inferring sequences from mass spectrometry data without relying on protein databases. Recent deep learning models have significantly advanced this task but remain trapped in a major dilemma: they require labeled training data to recognize post-translational modifications (PTMs), which is unavailable for most biologically relevant but rare or unknown modifications. We solve this long-standing problem by introducing RNovA, a transformer-based de novo sequencing algorithm enhanced with relative positional embeddings and reinforcement learning. RNovA enables open PTM discovery in a zero-shot setting—without retraining or a predefined list of candidate residues—while maintaining state-of-the-art performance on standard benchmarks. Demonstrating this capability, we successfully identified peptides modified by kynurenine—an uncommon and biologically relevant PTM—in clinical samples from rheumatoid arthritis patients. RNovA overcomes key limitations of existing methods and enables the exploration of the “dark proteome,” including novel proteins and unexpected modifications. This capability is widely needed in immunology, biomarker discovery, and biomedical research.