Evaluating open LLMs for agentic analysis orchestration in a typical biomedical lab

Anton Nekrutenko

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Agentic tools — software environments where a large language model plans, calls external tools, executes code, and iterates with minimal human intervention — will run a substantial share of routine biomedical data analysis within the next few years. However, per-call inference cost on frontier models is the bottleneck and can add up quickly. Here, we tested whether a free, locally-runnable open-weight model could take over the repetitive execution steps at frontier accuracy. We used Claude’s Opus to author plans of increasing detail for per-sample variant calling, and ran six 2026-release open-weight implementer LLMs against those plans on a set of desktop GPUs. qwen3.6:27b reproduced frontier accuracy on every plan and matched Opus cell-for-cell on a 36-cell error-injection matrix. A sub-$2,000 Jetson or Apple Mac Mini sufficed for the implementer side. The open-weight model landscape evolves on the order of months, so the specific implementer recommended here will be superseded; we provide the plans, harness, scoring code, and per-cell artifacts at https://github.com/nekrut/LLM-eval-paper as a framework for re-evaluating future models.

Version published to 10.64898/2026.05.13.724985 on bioRxiv
May 18, 2026

Open-Rosalind: Tool-First Biomedical LLM Agents with Process-Aware Benchmarking

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version May 8, 2026
Benchmarking and behavioral characterization of LLM agents for protein design

This article has 2 authors:
1. Jeonghyeon Kim
2. Philip Romero
This article has no evaluationsLatest version May 8, 2026
BioGAIP: A Scalable, User-Friendly and Robust LLM-Powered Multi-Agent System for Automated Bioinformatics Tasks

This article has 6 authors:
1. Jiayu Zhang
2. Pengfei Guo
3. Guanghui Jiang
4. Mengyu Zhou
5. Gang Wei
6. Ting Ni
This article has no evaluationsLatest version May 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Open-Rosalind: Tool-First Biomedical LLM Agents with Process-Aware Benchmarking

Benchmarking and behavioral characterization of LLM agents for protein design

BioGAIP: A Scalable, User-Friendly and Robust LLM-Powered Multi-Agent System for Automated Bioinformatics Tasks