PromptBio-Bench: Benchmarking LLM-based Bioinformatics Agents for End-to-End Data Analysis

Wenbin Guo
Minzhe Zhang
Bowei Han
Youjia Ma
Yang Leng
Shishir Hebbar
Xiaoyuan Zhou
Wenhao Gu
Xiao Yang
Shashi Dhar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language model (LLM)-based agents hold transformative potential for automating bioinformatics workflows; however, systematic evaluations of their capabilities remain limited, hindering a clear assessment of their readiness for real-world application. We introduce PromptBio-Bench, a comprehensive evaluation suite of 194 expert-curated tasks spanning bioinformatics and data science at varied difficulty levels, and an evaluation framework for structured file comparison and scoring against expert reference answers. Benchmarking three state-of-the-art agents revealed that Biomni and ToolsGenie achieved comparable performance, and accuracy declined markedly at higher difficulty levels across all agents. As foundation models and agent frameworks continue to evolve, PromptBio-Bench provides a valuable benchmark infrastructure for the community to systematically track the progress of agentic bioinformatics.

Version published to 10.64898/2026.05.05.723092 on bioRxiv
May 8, 2026

BioAutoML-FAST: an automated machine-learning platform for reusable and benchmarked biological sequence models

This article has 7 authors:
1. Breno L. S. de Almeida
2. Robson P. Bonidia
3. Martin Bole
4. Anderson Avila-Santos
5. Peter F. Stadler
6. Ulisses N. da Rocha
7. André C. P. L. F. de Carvalho
This article has no evaluationsLatest version Apr 22, 2026
Claw4Science: A Dataset and Platform for the OpenClaw Scientific Agent Ecosystem

This article has 3 authors:
1. Mingyang Xu
2. Junhao Chen
3. Zaixi Zhang
This article has no evaluationsLatest version Apr 1, 2026
End-to-end evaluation of pipelines for metagenome-assembled genomes reveals hidden performance gaps

This article has 6 authors:
1. Izaak Coleman
2. Jiong Ma
3. Gordon Qian
4. Yusheng Jiang
5. Aya Brown Kav
6. Tal Korem
This article has no evaluationsLatest version Apr 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

BioAutoML-FAST: an automated machine-learning platform for reusable and benchmarked biological sequence models

Claw4Science: A Dataset and Platform for the OpenClaw Scientific Agent Ecosystem

End-to-end evaluation of pipelines for metagenome-assembled genomes reveals hidden performance gaps