Learning the Unseen: Data-Augmented Deep Learning for PTM Discovery with Prosit-PTM

Wassim Gabriel
Daniel P. Zolg
Victor Giurcoiu
Omar Shouman
Polina Prokofeva
Florian Seefried
Florian P. Bayer
Ludwig Lautenbacher
Armin Soleymaniniya
Karsten Schnatbaum
Johannes Zerweck
Tobias Knaute
Bernard Delanghe
Andreas Huhmer
Holger Wenschuh
Ulf Reimer
Guillaume Médard
Bernhard Kuster
Mathias Wilhelm

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Post-translational modifications (PTMs) are critical regulators of protein function, yet confidently identifying and localizing PTM sites across proteomes remains a challenging task. Integrating peptide property predictions into spectrum interpretation improves identification performance, but training data enabling zero-shot prediction across diverse PTMs are scarce. Here, we present a major expansion of the ProteomeTools dataset, comprising over 977,000 synthetic peptides, covering 22 PTM–residue combinations. Furthermore we developed Prosit-PTM, a model with chemically-informed encoding and amino acid substitution-based augmentation trained with our novel ground-truth dataset, that achieves accurate zero-shot predictions. Applied to modified peptides, Prosit-PTM enhances PTM-site localization in phosphoproteomics, increases identification of multiply modified peptides in histones, and enables data-driven rescoring for unseen modifications such as HLA peptides. Furthermore, the learned embeddings of amino acids and modifications capture physicochemical relationships underlying PTM-driven HLA presentation. Prosit-PTM is integrated into multiple open-source tools enabling PTM-aware rescoring, site localization, spectral library generation, and beyond.

Version published to 10.1101/2025.11.07.687302 on bioRxiv
Nov 10, 2025

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
Accurate, scalable, and unified single-cell atlas integration with scBIOT

This article has 2 authors:
1. Haihui Zhang
2. Peiwu Qin
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

Accurate, scalable, and unified single-cell atlas integration with scBIOT