Autonomous Liquid-handling Robotics Scripting for Accessible and Responsible Protein Engineering
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
Laboratory automation enhances experimental throughput and reproducibility, yet widespread adoption is constrained by the expertise required for robotic programming. Here, we introduce LabscriptAI, a multi-agent framework that enables large language models to autonomously generate and validate executable Python scripts for protein engineering automation. Across a 55-task benchmark spanning four difficulty levels and multiple liquid-handling platforms, LabscriptAI achieved high success rates and outperformed both direct large language model baselines and a commercial solution. LabscriptAI automated cell-free protein synthesis and characterization of 298 green fluorescent protein (GFP) variants designed by 53 teams from five countries in a student challenge; the top variant achieved functional performance comparable to an extensively optimized benchmark while exploring distinct sequence space. Furthermore, LabscriptAI orchestrated distributed automation across a biofoundry and fume hood-enclosed systems to engineer enzyme variants utilizing formaldehyde, a sustainable but hazardous substrate, and identified a double mutant with sevenfold increase in catalytic efficiency. The platform implements rigorous safety measures, including biosecurity screening, physical containment, and human-in-the-loop oversight, to safeguard autonomous protein engineering. LabscriptAI democratizes laboratory automation by eliminating programming barriers while promoting responsible research practices.
Article activity feed
-
LabscriptAI achieved 89.1% overall success in generating simulation-passing scripts, outperforming all baselines (Table 1)
It would be interesting to know what percentage of these scripts you were able to run on the opentrons, I've seen scripts that pass simulation but then immediately throw errors when uploaded to the robot
-
Performance was compared against direct LLM implementations (GPT-5, Claude 4, DeepSeek V3.2, Gemini-2.5 Pro) and OpentronsAI, a commercial LLM–based solution (Table 1).
I'd be interested to see this comparison done with other agentic approaches like Claude Code or Cursor if they were given the same kind of context upfront
-