AstraROLE2 & AstraSUIT2: Multi-Task Annotation Models for Functional Profiling of Proteins

Çağlar Bozkurt
Alexandra Vasilyeva
Aniruddh Goteti

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Most in-silico protein characterisation tools focus on only one aspect of protein function, forcing researchers to use multiple models or to bypass computational checks. Here we introduce A stra ROLE2 and A stra SUIT2, two transformer-based, multi-task annotators that deliver an integrated functional profile in a single pass.

A 1,351-dimensional input (ESM-2 CLS embeddings plus physicochemical Orbion enrichments) is mapped by a 512-unit encoder and task-specific linear heads: four in A stra ROLE2 (EC class, GO term, molecular pathway, protein category) and nine in A stra SUIT2 (cofactor group, specific cofactor, domain, host, membrane association type, transmembrane helix number, subcellular localization, quaternary category, quaternary stoichiometry). Models were trained on 730k UniProt proteins with stratified 70/15/15 splits; class-weighted BCE and Optuna hyper-parameter search countered imbalance. On hold-out sets the heads reached macro F ₁ =0.84–0.98 and MCC=0.85–0.98. Highest scores were seen for cofactor binding (0.98), membrane association type (F ₁ =0.97) and top-level EC number (0.96); GO term classification was hardest (0.85). Against recent comparators (incl. DeepGOPlus and TargetP 2.0), the Astra models matched or exceeded performance, especially on metal-ion binding and cofactor binding. Additional tests on three novel proteins not included in initial dataset showed good predictions for most labels, underscoring the potential for hypothesis generation.

Overall, A stra ROLE2 and A stra SUIT2 supplied fast, state-of-the-art multi-label protein annotation within one unified model network.

Version published to 10.1101/2025.06.21.660734 on bioRxiv
Jun 26, 2025

A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Survey on Efficient Protein Language Models

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods