AstraROLE & AstraSUIT: Multi-Task Annotation Models for Functional Profiling of Proteins
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Most in-silico protein characterisation tools focus on only one aspect of protein function, forcing researchers to use multiple models or to bypass computational checks. Here we introduce AstraROLE and AstraSUIT, two transformer-based, multi-task annotators that deliver an integrated functional profile in a single pass. A 1,351-dimensional input (ESM-2 CLS embeddings plus physicochemical Orbion enrichments) is mapped by a 512-unit encoder and task-specific linear heads: four in AstraROLE (EC class, GO term, molecular pathway, protein category) and six in AstraSUIT (cofactor, domain, host, membrane association type, transmembrane helix number, subcellular localization). Models were trained on 730k UniProt proteins with stratified 70/15/15 splits; class-weighted BCE and Optuna hyper-parameter search countered imbalance. On hold-out sets the heads reached macro F 1 =0.82–0.98 and MCC=0.79–0.98. Highest scores were seen for membrane association type (F 1 =0.98), top-level EC number (0.97) and cofactor binding (0.96); organism type association was hardest (0.82). Against recent comparators (incl. DeepGOPlus and TargetP 2.0), the Astra models matched or exceeded performance, especially on metal-ion binding and subcellular localisation. Additional tests on three novel proteins not included in initial dataset showed good predictions for most labels, underscoring the potential for hypothesis generation.
Overall, AstraROLE and AstraSUIT supplied fast, state-of-the-art multi-label protein annotation within one unified model network.