AstraPTM2: A Context-Aware Transformer for Broad-Spectrum PTM Prediction

Çağlar Bozkurt
Alexandra Vasilyeva
Aniruddh Goteti

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Post-translational modifications (PTMs) are covalent changes in proteins after biosynthesis that shape stability, localization, and function. While numerous computational tools exist for PTM site prediction, most struggle to handle full-length proteins without truncating context, focus on only a limited number of PTM types, and perform unevenly on rare modifications.

We present AstraPTM2, a transformer-based model that predicts 39 distinct PTM types on full-length sequences. By combining ESM-2 embeddings, AlphaFold2-derived structural features, and protein-level descriptors, AstraPTM2 captures both short-range motifs and long-range dependencies. Training uses a three-stage curriculum and adaptive focal loss to balance rare and common PTMs, followed by per-label affine calibration and optimized thresholds for well-calibrated probabilities.

In hold-out tests, AstraPTM2 achieves AUROC = 0.99 and macro-F1 = 59% across 39 PTM types, with particularly strong performance on rare motif-driven PTMs such as O-linked glycosylation and sumoylation. Results are available through the Orbion web platform, which offers synchronized 2D and 3D visualizations, dual prediction modes (calibrated and exploratory), and reproducible exports to support downstream experimental planning.

AstraPTM2 can be accessed at https://www.orbion.life .

Version published to 10.1101/2025.10.03.680341 on bioRxiv
Oct 4, 2025

Learning the Unseen: Data-Augmented Deep Learning for PTM Discovery with Prosit-PTM

This article has 19 authors:
1. Wassim Gabriel
2. Daniel P. Zolg
3. Victor Giurcoiu
4. Omar Shouman
5. Polina Prokofeva
6. Florian Seefried
7. Florian P. Bayer
8. Ludwig Lautenbacher
9. Armin Soleymaniniya
10. Karsten Schnatbaum
11. Johannes Zerweck
12. Tobias Knaute
13. Bernard Delanghe
14. Andreas Huhmer
15. Holger Wenschuh
16. Ulf Reimer
17. Guillaume Médard
18. Bernhard Kuster
19. Mathias Wilhelm
This article has no evaluationsLatest version Nov 10, 2025
FlexRibbon: Joint Sequence and Structure Pretraining for Protein Modeling

This article has 23 authors:
1. Jianwei Zhu
2. Yu Shi
3. Ran Bi
4. Peiran Jin
5. Chang Liu
6. Zhe Zhang
7. Haitao Huang
8. Zekun Guo
9. Pipi Hu
10. Fusong Ju
11. Lin Huang
12. Xinwei Tai
13. Chenao Li
14. Kaiyuan Gao
15. Xinran Wei
16. Huanhuan Xia
17. Jia Zhang
18. Yaosen Min
19. Zun Wang
20. Yusong Wang
21. Liang He
22. Haiguang Liu
23. Tao Qin
This article has no evaluationsLatest version Oct 10, 2025
Pretrained protein language models choose between sequence novelty and structural completeness

This article has 3 authors:
1. Arjuna M. Subramanian
2. Zachary A. Martinez
3. Matt Thomson
This article has no evaluationsLatest version Oct 3, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Learning the Unseen: Data-Augmented Deep Learning for PTM Discovery with Prosit-PTM

FlexRibbon: Joint Sequence and Structure Pretraining for Protein Modeling

Pretrained protein language models choose between sequence novelty and structural completeness