Hard to Halt: Automation Bias in Agent-Driven Sequencing Prior Authorization Workflows

Mengshu Nie
Wendy Chung
Jessica Waxler
Michael Lee
Chunhua Weng
Rachel Lewis
Priyanka Ahimaz
Kai Wang
Cong Liu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose

Prior authorization (PA) for exome or genome sequencing is a time-consuming process that impedes timely rare disease diagnosis. Large language model-based browser agents offer potential for automating these workflows, but their clinical reliability remain uncharacterized.

Methods

We developed a sandbox compromising a simulated ES/GS PA submission payer portal and a synthetic EHR containing 836 patient records spanning compliant profiles and deficient profiles with different types of issues. Gemini 3 Pro, Gemini 3 Flash, and Claude Opus 4.5 were evaluated on task completion rate, form completion accuracy, and appropriate withholding for deficient profiles.

Results

Larger models achieved much higher task completion rates (Gemini 3 Pro 95.45%, Claude Opus 4.5 93.67%) compared to Gemini 3 Flash (56.05%), but nearly universally failed to withhold submission for deficient profiles whereas Gemini 3 Flash ironically demonstrated superior withholding performance (17.33%). In a non-agentic setting, Gemini 3 Pro correctly identified 91% of the issues in deficient profiles, indicating that withholding failure is attributable to the browser interaction rather than the model’s reasoning limitations.

Conclusion

Current LLM-based browser agents exhibit a systematic bias towards form submission that poses risks in PA workflows. A modular, multi-agent architecture with human supervision is necessary for a safe clinical deployment.

Version published to 10.64898/2026.06.16.26355782 on medRxiv
Jun 18, 2026

FlowBench: separating planning, fault recovery and interpretation in agentic bioinformatics

This article has 2 authors:
1. Alina Kurjan
2. Adam P. Cribbs
This article has no evaluationsLatest version Jun 16, 2026
Benchmarking large language models for ACMG/AMP variant interpretation and variant calling

This article has 1 author:
1. Manuel Corpas
This article has no evaluationsLatest version Jul 5, 2026
S2F-agent: Skill-grounded agent for Sequence-to-Function computational genomics workflows

This article has 2 authors:
1. Jiaqi Li
2. Zhiwei Bao
This article has no evaluationsLatest version May 15, 2026

Discuss this preprint

Listed in

Abstract

Purpose

Methods

Results

Conclusion

Article activity feed

Related articles

FlowBench: separating planning, fault recovery and interpretation in agentic bioinformatics

Benchmarking large language models for ACMG/AMP variant interpretation and variant calling

S2F-agent: Skill-grounded agent for Sequence-to-Function computational genomics workflows