Towards Fully Autonomous Valet Parking: A Comprehensive Vision-and-Language Dataset and Benchmark Toolkit
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Autonomous Valet Parking (AVP) represents the final stage in autonomous driving applications, where vehicles are expected to navigate complex and dynamic parking environments autonomously without human intervention. However, the progress of AVP research is currently hindered by the lack of task-specific datasets, making it challenging to develop and validate AVP algorithms effectively. By formalizing the static object features of typical parking lots, including locations, obstacles and attributes, this paper introduces the Vision-and-Language Parking (VLP) dataset, featuring 174 onboard panoramic images and 301 commands, marking it as the first dataset for AVP tasks. Additionally, we develop an Agent-oriented Benchmark AI toolkit with 14 baselines from Rule-Based (RB) scripts, Reinforcement Learning (RL), Deep Learning (DL) and Multimodal Large Language Model (MLLM). The results show reinforcement learning faces significant trajectory exploration challenges, deep learning struggles with out-of-distribution data generalization and MLLM shows good language understanding but poor analysis of environmental observation. The dataset and benchmark proposed in this paper provide a foundational basis for the development, sharing, and expansion of AVP algorithms.