Deep Reinforcement and IL for Autonomous Driving: A Review in the CARLA Simulation Environment

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Autonomous driving is a complex and fast-evolving domain at the intersection of robotics, machine learning, and control systems. This paper provides a systematic review of recent developments in reinforcement learning (RL) and imitation learning (IL) approaches for autonomous vehicle control, with a dedicated focus on the CARLA simulator, an open-source, high-fidelity platform that has become a standard for learning-based autonomous vehicle (AV) research. We analyze RL-based and IL-based studies, extracting and comparing their formulations of state, action, and reward spaces. Special attention is given to the design of reward functions, control architectures, and integration pipelines. Comparative graphs and diagrams illustrate performance trade-offs. We further highlight gaps in generalization to real-world driving scenarios, robustness under dynamic environments, and scalability of agent architectures. Despite rapid progress, existing autonomous driving systems exhibit significant limitations. For instance, studies show that end-to-end reinforcement learning (RL) models can suffer from performance degradation of up to 35% when exposed to unseen weather or town conditions, and imitation learning (IL) agents trained solely on expert demonstrations exhibit up to 40% higher collision rates in novel environments. Furthermore, reward misspecification remains a critical issue—over 20% of reported failures in simulated environments stem from poorly calibrated reward signals. Generalization gaps, especially in RL, also manifest in task-specific overfitting, with agents failing up to 60% of the time when faced with dynamic obstacles not encountered during training. These persistent shortcomings underscore the need for more robust and sample-efficient learning strategies. Finally, we discuss hybrid paradigms that integrate IL and RL, such as Generative Adversarial IL, and propose future research directions.

Article activity feed