BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning

Adibvafa Fallahpour
Arman Seyed-Ahmadi
Parsa Idehpour
Omar Ibrahim
Purav Gupta
Jack Naimer
Kevin Zhu
Arnav Shah
Shihao Ma
Abhinav Adduri
Talu Güloglu
Nuo Liu
Haotian Cui
Arihant Jain
Max de Castro
Amirfaham Fallahpour
Antonio Cembellin-Prieto
John S. Stiles
Filip Nemčko
Alexander A. Nevue
Hyungseok C. Moon
Lucas Sosnick
Olivia Markham
Haonan Duan
Michelle Y. Y. Lee
Andrea F. M. Salvador
Chris J. Maddison
Christoph A. Thaiss
Chiara Ricci-Tam
Brian S. Plosky
Dave P. Burke
Patrick D. Hsu
Hani Goodarzi
Bo Wang

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Protein function annotation is fundamental to understanding biological mechanisms, designing therapeutics, and advancing biomedical research. Current computational methods either rely on shallow sequence similarity or treat function prediction as isolated classification tasks, failing to capture the integrative reasoning across sequence, structure, domains, and interactions that expert biologists perform to infer function. We introduce BioReason-Pro, the first multimodal reasoning large language model (LLM) for protein function prediction that integrates protein embeddings with biological context to generate structured reasoning traces. A key input into BioReason-Pro is the set of GO term predictions made by GO-GPT, our autoregressive transformer that captures hierarchical and cross-aspect dependencies of GO terms. BioReason-Pro is trained via supervised fine-tuning on synthetic reasoning traces generated by GPT-5 for over 130K proteins and further optimized through reinforcement learning. It achieves 73.6% F _max on GO term prediction and an LLM judge score of 8/10 on functional summaries, substantially outperforming previous methods. Evaluations with human protein experts show that BioReason-Pro annotations are preferred over ground truth UniProt annotations in 79% of cases. Remarkably, BioReason-Pro de novo predicted experimentally confirmed binding partners with per-residue attention localizing to the exact contact residues resolved in cryo-EM structures of those complexes. Together, GO-GPT and BioReason-Pro establish a framework for protein function prediction that combines precise ontology modeling with interpretable biological reasoning.

Arcadia Science
Apr 20, 2026

BioReason-Pro has several important limitations. The model was trained on synthetic reasoning traces generated by GPT-5 (Singh et al., 2025), which may contain subtle reasoning errors that propagate into the model. Furthermore, training requires proteins with experimental GO annotations, a resource that remains costly and limited in throughput to produce. Reasoning quality is heavily influenced by the availability of recognizable protein domains and degrades for proteins that lack identifiable InterPro annotations (Blum et al., 2024). Performance also degrades for extremely short peptides below 50 amino acids, where limited sequence information constrains the model’s ability to ground its functional predictions. For synthetic proteins that lack identifiable domains, such as the EvoAcr sequences (Merchant et al., 2025), predictions …

BioReason-Pro has several important limitations. The model was trained on synthetic reasoning traces generated by GPT-5 (Singh et al., 2025), which may contain subtle reasoning errors that propagate into the model. Furthermore, training requires proteins with experimental GO annotations, a resource that remains costly and limited in throughput to produce. Reasoning quality is heavily influenced by the availability of recognizable protein domains and degrades for proteins that lack identifiable InterPro annotations (Blum et al., 2024). Performance also degrades for extremely short peptides below 50 amino acids, where limited sequence information constrains the model’s ability to ground its functional predictions. For synthetic proteins that lack identifiable domains, such as the EvoAcr sequences (Merchant et al., 2025), predictions become heavily dependent on the organism label, producing divergent functional annotations and interaction partners across organisms for the same sequence. That said, several of these predictions were biologically coherent with known phage-encoded effector biology, suggesting that BioReason-Pro can nominate plausible hypotheses even in this challenging regime. LLM-based evaluation with GPT-5.1 may harbor systematic biases, and human expert evaluation covered 162 proteins, a sample size that limits statistical power for fine-grained comparisons. The model is also computationally expensive, requiring sequential inference through ESM3 (Hayes et al., 2024), GO-GPT, and the reasoning LLM. Finally, whether BioReason-Pro learns genuine biological reasoning or sophisticated imitation of reasoning patterns remains an open scientific question.

I appreciate this transparency! Could you explain what practical implications these limitations have, and if they limit utility of this approach?

Read the original source
Arcadia Science
Apr 20, 2026

We employed GPT-5.1, the most capable model available at the time of evaluation, as an expert judge,

As you note later on, there is some concern about bias here given the use of GPT-5 for the training too. Do you have further plans to mitigate this possible bias?

Read the original source
Version published to 10.64898/2026.03.19.712954 on bioRxiv
Mar 20, 2026

GL-E2EATP: improving protein-ATP binding residue prediction using global and local embedding of protein language model

This article has 7 authors:
1. Bing Rao
2. Jie Bai
3. Maha A. Thafar
4. Somayah Albaradei
5. Kamran Arshad
6. Apilak Worachartcheewanh
7. Muhammad Arif
This article has no evaluationsLatest version Mar 26, 2026
Oncogene and Tumor Suppressor Gene Classification Using Protein Language Model Embeddings and a Novel Optimization Strategy

This article has 1 author:
1. Ahmet Emir Şaşmazlar
This article has no evaluationsLatest version Mar 11, 2026
A Comprehensive Survey of Multimodal LLMs for Scientific Discovery

This article has 15 authors:
1. Liang Yan
2. Xu Jiang
3. Jian Ma
4. Yuhang Liu
5. Tian Bian
6. Qichao Wang
7. Abhishek Basu
8. Yu Rong
9. Tingyang Xu
10. Pengcheng Wu
11. Le Song
12. Imran Razzak
13. Junchi Yan
14. Zengfeng Huang
15. Yutong Xie
This article has no evaluationsLatest version Feb 27, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

GL-E2EATP: improving protein-ATP binding residue prediction using global and local embedding of protein language model

Oncogene and Tumor Suppressor Gene Classification Using Protein Language Model Embeddings and a Novel Optimization Strategy

A Comprehensive Survey of Multimodal LLMs for Scientific Discovery