Neuro-Fuzzy Enhanced Deep Reinforcement Learning for Adaptive Urban Traffic Signal Control

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep Reinforcement Learning (DRL) has become a leading paradigm for adaptive traffic signal control, yet baseline implementations suffer from eight structural limitations spanning action discretization, reward noise, state interpretability, phase rigidity, undirected exploration, Q-value uncertainty, experience replay weighting, and sensor robustness. This paper proposes NF_PN_D3QN, a Neuro-Fuzzy enhanced extension of the Prioritized Noisy Dueling Double Deep Q-Network (PN_D3QN), which addresses all eight gaps through a unified neuro-fuzzy framework incorporating a Fuzzy Feature Extractor, Mamdani Reward Shaper, Green Duration FIS (Fuzzy Inference System), Phase Urgency scorer, Exploration Policy, softmax-entropy Q-Confidence gate, Fuzzy Priority PER (Prioritized Experience Replay), and Fuzzy Sensor Model. A methodological contribution is the documented three-version evolution of the Q-confidence gate: absolute gap thresholding produced 97% FIS dominance and no effective learning; relative gap normalization reduced but did not eliminate FIS persistence at convergence; softmax entropy correctly identified genuine network uncertainty, allowing FIS deferral to decline naturally from 11% at episode 1 to 0% by episode 30. Experiments across five traffic scenarios in Simulation of Urban Mobility (SUMO) show that NF_PN_D3QN achieves a last-25-episode mean waiting time of 6.40 seconds, statistically equivalent to PN_D3QN's converged 6.07 seconds, confirming that all DRL methods share a common performance ceiling well below Max-Pressure's 13.0 seconds. NF_PN_D3QN's primary advantage is sample efficiency: deployable performance is reached in 15 to 20 episodes versus 80 to 90 for PN_D3QN and 150 or more for D3QN, a four to eight times improvement with direct implications for live deployment where poor early decisions affect real commuters.

Article activity feed