AstraPTM: Context-Aware PTM Prediction Model for Large-Scale Proteins
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Post-translational modifications (PTMs) are critical molecular events that pro-foundly influence protein stability, localization, and function. While numerous computational tools exist for PTM site prediction, most struggle with handling large proteins and rely on separate models for each modification. To address these challenges, we introduce AstraPTM , a transformer-based framework that predicts 25 PTMs in a single pass. By leveraging advanced protein embeddings (ESM2) and training on a high-coverage dbPTM dataset, AstraPTM captures both short-range sequence motifs and long-range interactions across proteins without a sequence length limitation.
AstraPTM combines a binary classification module—indicating whether a residue is modified—with a multi-label module that pinpoints specific PTM types. This dual approach achieves high accuracy on well-represented PTMs (e.g., phospho-rylation, glycosylation) while maintaining sensitivity for rarer modifications. In benchmarks against existing methods such as MusiteDeep and MIND-S, AstraPTM demonstrates competitive or superior performance, demonstrating AUC-ROC above 99% for well-represented modifications, underscoring its versatility for proteome-wide annotation. Beyond prediction, the model’s capacity to handle full-length proteins offers a powerful resource for researchers investigating PTM crosstalk and disease pathways, ultimately bridging the gap between large-scale omics data and targeted biomedical applications.