Safe Model-Free Q-Learning for Discrete-Time Fully Cooperative Multi-Input Systems with State and Control Constraints via Control Barrier Functions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper proposes a safe model-free Q-learning algorithm for fully cooperative multi-input discrete-time nonlinear systems subject to both state and control constraints. In the fully cooperative setting, all control inputs share a common performance index and cooperate to stabilize the system while satisfying prescribed safety constraints. Unlike existing approaches that require system dynamics knowledge or neural network identification, the proposed method employs tabular Q-learning to directly learn the optimal cooperative control policies from measured state transitions without any model information. Discrete-time exponential control barrier functions are integrated as a safety filter, ensuring forward invariance of the safe set at every time step during both learning and deployment. The constrained value iteration framework guarantees convergence to the optimal safe policies without requiring initial admissible control policies. Theoretical analysis establishes both the safety guarantee via barrier function conditions and convergence of the iterative scheme. Two numerical examples are presented: a two-input nonlinear system with linear state constraints and a three-input nonlinear system with an elliptical state constraint. Simulation results demonstrate that the proposed algorithm achieves a 100\% safety rate across all tested initial conditions, while unconstrained Q-learning violates safety in 40--60\% of cases. The model-free nature and guaranteed safety make the approach attractive for safety-critical applications where system dynamics are unknown.

Article activity feed