Multi Armed Bandit Based Adaptive Model Selection for Clinical AI Governance An Infrastructure Approach to Safer AI Use in U.S. Healthcare
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Clinical artificial intelligence systems are increasingly embedded in routine care delivery, including high risk decision contexts. Despite widespread adoption, most clinical AI systems are deployed using static model selection strategies based on aggregate offline validation metrics. Such approaches can obscure context dependent performance variation, permit systematic decision failures to persist during real world use, and limit the ability of health systems to manage safety, operational cost, and equity at deployment time. Objectives To evaluate whether a multi armed bandit based governance framework that incorporates patient context, operational cost constraints, and delayed outcome feedback can function as an adaptive deployment control mechanism for clinical AI systems. Methods We conducted a retrospective simulation study designed to emulate prospective, sequential deployment of clinical AI using structured electronic health record data from U.S. intensive care unit admissions. Multiple independently trained and previously validated mortality prediction models were treated as candidate decision arms within a contextual multi armed bandit framework. Model selection was governed by a fixed confidence action elimination policy. Rewards were defined using a composite function that explicitly balanced predictive performance, operational cost, and safety penalties motivated by prior audit evidence of clinically meaningful failure modes. Outcome feedback was modeled with realistic delays to reflect the temporal availability of clinical endpoints during deployment. Results Across 3,000 sequential ICU admissions, adaptive deployment governed by the multi armed bandit framework achieved lower cumulative regret than static single model deployment, uniform random selection, ensemble averaging, and heuristic switching strategies. Bandit governed deployment demonstrated improved decision safety, reflected by lower false negative rates, while maintaining comparable overall predictive accuracy. The governance layer also reduced mean operational cost per prediction and exhibited more stable performance across age and sex subgroups relative to static deployment. Conclusions Multi armed bandit based governance offers a principled and scalable approach for translating audit identified risks in clinical AI into adaptive deployment control. By managing uncertainty, safety, and operational cost at the level of individual deployment decisions, this framework supports safer, more efficient, and more equitable use of clinical AI systems without requiring model retraining or disruptive workflow changes.