AstraPTM2: A Context-Aware Transformer for Broad-Spectrum PTM Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Post-translational modifications (PTMs) are covalent changes in proteins after biosynthesis that shape stability, localization, and function. While numerous computational tools exist for PTM site prediction, most struggle to handle full-length proteins without truncating context, focus on only a limited number of PTM types, and perform unevenly on rare modifications.

We present AstraPTM2, a transformer-based model that predicts 39 distinct PTM types on full-length sequences. By combining ESM-2 embeddings, AlphaFold2-derived structural features, and protein-level descriptors, AstraPTM2 captures both short-range motifs and long-range dependencies. Training uses a three-stage curriculum and adaptive focal loss to balance rare and common PTMs, followed by per-label affine calibration and optimized thresholds for well-calibrated probabilities.

In hold-out tests, AstraPTM2 achieves AUROC = 0.99 and macro-F1 = 59% across 39 PTM types, with particularly strong performance on rare motif-driven PTMs such as O-linked glycosylation and sumoylation. Results are available through the Orbion web platform, which offers synchronized 2D and 3D visualizations, dual prediction modes (calibrated and exploratory), and reproducible exports to support downstream experimental planning.

AstraPTM2 can be accessed at https://www.orbion.life .

Article activity feed