Safe Model-Free Q-Learning for Discrete-Time Fully Cooperative Multi-Input Systems with State and Control Constraints via Control Barrier Functions

Md Nur-A-Adam Dony
Bernard Arhin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper proposes a safe model-free Q-learning algorithm for fully cooperative multi-input discrete-time nonlinear systems subject to both state and control constraints. In the fully cooperative setting, all control inputs share a common performance index and cooperate to stabilize the system while satisfying prescribed safety constraints. Unlike existing approaches that require system dynamics knowledge or neural network identification, the proposed method employs tabular Q-learning to directly learn the optimal cooperative control policies from measured state transitions without any model information. Discrete-time exponential control barrier functions are integrated as a safety filter, ensuring forward invariance of the safe set at every time step during both learning and deployment. The constrained value iteration framework guarantees convergence to the optimal safe policies without requiring initial admissible control policies. Theoretical analysis establishes both the safety guarantee via barrier function conditions and convergence of the iterative scheme. Two numerical examples are presented: a two-input nonlinear system with linear state constraints and a three-input nonlinear system with an elliptical state constraint. Simulation results demonstrate that the proposed algorithm achieves a 100\% safety rate across all tested initial conditions, while unconstrained Q-learning violates safety in 40--60\% of cases. The model-free nature and guaranteed safety make the approach attractive for safety-critical applications where system dynamics are unknown.

Version published to 10.21203/rs.3.rs-9108014/v1 on Research Square
Mar 23, 2026

Distributed Safety Formation Tracking Control for Multi-Agent Systems with Dynamic Event-Triggered Mechanism

This article has 2 authors:
1. Zizhen Guo
2. Lijun Long
This article has no evaluationsLatest version Mar 24, 2026
Dual-Mode Model Predictive Motion Cueing

This article has 4 authors:
1. Tim Nicolai
2. Sebastian Emmerich
3. Michael Burger
4. Matthias Gerdts
This article has no evaluationsLatest version Mar 12, 2026
K–R Adaptive Flight Control: Physics-Informed Nonlinear Residual Correction for Robust Trajectory Tracking Under Model Uncertainty

This article has 1 author:
1. RamaKrishna Pasupuleti
This article has no evaluationsLatest version Mar 31, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Distributed Safety Formation Tracking Control for Multi-Agent Systems with Dynamic Event-Triggered Mechanism

Dual-Mode Model Predictive Motion Cueing

K–R Adaptive Flight Control: Physics-Informed Nonlinear Residual Correction for Robust Trajectory Tracking Under Model Uncertainty