AstraROLE2 & AstraSUIT2: Multi-Task Annotation Models for Functional Profiling of Proteins
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Most in-silico protein characterisation tools focus on only one aspect of protein function, forcing researchers to use multiple models or to bypass computational checks. Here we introduce A stra ROLE2 and A stra SUIT2, two transformer-based, multi-task annotators that deliver an integrated functional profile in a single pass.
A 1,351-dimensional input (ESM-2 CLS embeddings plus physicochemical Orbion enrichments) is mapped by a 512-unit encoder and task-specific linear heads: four in A stra ROLE2 (EC class, GO term, molecular pathway, protein category) and nine in A stra SUIT2 (cofactor group, specific cofactor, domain, host, membrane association type, transmembrane helix number, subcellular localization, quaternary category, quaternary stoichiometry). Models were trained on 730k UniProt proteins with stratified 70/15/15 splits; class-weighted BCE and Optuna hyper-parameter search countered imbalance. On hold-out sets the heads reached macro F 1 =0.84–0.98 and MCC=0.85–0.98. Highest scores were seen for cofactor binding (0.98), membrane association type (F 1 =0.97) and top-level EC number (0.96); GO term classification was hardest (0.85). Against recent comparators (incl. DeepGOPlus and TargetP 2.0), the Astra models matched or exceeded performance, especially on metal-ion binding and cofactor binding. Additional tests on three novel proteins not included in initial dataset showed good predictions for most labels, underscoring the potential for hypothesis generation.
Overall, A stra ROLE2 and A stra SUIT2 supplied fast, state-of-the-art multi-label protein annotation within one unified model network.