Neuro-Fuzzy Enhanced Deep Reinforcement Learning for Adaptive Urban Traffic Signal Control
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep Reinforcement Learning (DRL) has become a leading paradigm for adaptive traffic signal control, yet baseline implementations suffer from eight structural limitations spanning action discretization, reward noise, state interpretability, phase rigidity, undirected exploration, Q-value uncertainty, experience replay weighting, and sensor robustness. This paper proposes NF_PN_D3QN, a Neuro-Fuzzy enhanced extension of the Prioritized Noisy Dueling Double Deep Q-Network (PN_D3QN), which addresses all eight gaps through a unified neuro-fuzzy framework incorporating a Fuzzy Feature Extractor, Mamdani Reward Shaper, Green Duration FIS (Fuzzy Inference System), Phase Urgency scorer, Exploration Policy, softmax-entropy Q-Confidence gate, Fuzzy Priority PER (Prioritized Experience Replay), and Fuzzy Sensor Model. A methodological contribution is the documented three-version evolution of the Q-confidence gate: absolute gap thresholding produced 97% FIS dominance and no effective learning; relative gap normalization reduced but did not eliminate FIS persistence at convergence; softmax entropy correctly identified genuine network uncertainty, allowing FIS deferral to decline naturally from 11% at episode 1 to 0% by episode 30. Experiments across five traffic scenarios in Simulation of Urban Mobility (SUMO) show that NF_PN_D3QN achieves a last-25-episode mean waiting time of 6.40 seconds, statistically equivalent to PN_D3QN's converged 6.07 seconds, confirming that all DRL methods share a common performance ceiling well below Max-Pressure's 13.0 seconds. NF_PN_D3QN's primary advantage is sample efficiency: deployable performance is reached in 15 to 20 episodes versus 80 to 90 for PN_D3QN and 150 or more for D3QN, a four to eight times improvement with direct implications for live deployment where poor early decisions affect real commuters.