Improving Safety in Reinforcement Learning-based Artificial Pancreas Systems
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Reinforcement learning has been applied to develop advanced insulin dosing strategies for type 1 diabetes. However, its adoption in real-world care has been challenged by safety concerns, such as catastrophic failures resulting from insulin overdose. To address this problem, we present Safe Rollback Policy Optimization, a novel algorithm designed to improve safety in blood glucose control. This algorithm augments the state-of-the-art proximal policy optimization with a dual optimization strategy and a rollback mechanism that monitors the time spent in a healthy glucose range. If a newly updated policy causes degraded safety performance, the algorithm reverts to the most recent safe policy, thereby preventing the agent from reinforcing harmful behaviors. We evaluated the proposed method in silico using a cohort of virtual patients simulated through the FDA-accepted UVA/Padova type 1 diabetes simulator. The results demonstrate that our approach reduces failure rates compared to traditional policy optimization methods, especially in high-risk scenarios. On average, the algorithm reduces failure rates compared to proximal policy optimization from 2.79\% to 0.38\% in adults and from 4.93\% to 1.42\% in adolescents. These findings suggest that safety-aware learning algorithms can enable more reliable and clinically viable artificial pancreas systems.