Assessing the 3D position of a car with a single 2D camera using mainstream DCNN models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep Convolutional Neural Networks (DCNNs) are regarded as one of the foundations of computer vision due to their unparalleled ability to process visual data. This work explores the use of DCNNs to estimate the orientation of vehicles from a single 2D image. The experiments included 48 training scenarios encompassing four dataset variations and 12 models, followed by an evaluation of each one based on four key metrics. Overall the best-performing architecture was EfficientNet-B2, achieving an accuracy of 97.22% consistently across all dataset variations, demonstrating its robustness to preprocessing techniques. Additionally, ResNet18 delivered competitive results, achieving the highest recorded accuracy of 98.61% on the original dataset, while MobileNetV2 also performed exceptionally well on the augmented and no-background datasets, reaching 98.61%. EfficientNet-B5 initially underperformed on the original dataset but significantly improved with augmentation, achieving 97.22% accuracy. The study revealed that dataset preprocessing played a crucial role in model performance, with augmentation and background removal significantly boosting accuracy. The classification mistakes were further analyzed using SHAP values, highlighting the importance of specific car features, such as the front and rear sections, in determining orientation. Overall, the results confirmed that vehicle orientation estimation can be effectively approached as a classification problem. The ResNet family proved highly robust, while EfficientNet-B2 emerged as a strong contender due to its consistency. MobileNetV2's efficiency and strong performance make it a viable option for real-time applications. Future work should explore transformer-based architectures and evaluate model performance on real-world datasets with varying environmental conditions.