Transformer-Based Framework for GPS and Timestamp Extraction from Dashcam Videos

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Dashboard cameras are becoming increasingly prevalent in vehicles, leading to a significant demand for reliable methods to extract essential metadata, such as timestamps, geolocation, and speed, from recorded footage. This metadata is essential for contextualizing events captured by these cameras, facilitating tasks such as accident reconstruction, security monitoring, and forensic analysis. However, many modern low-cost dashboard cameras overlay text directly onto video rather than logging metadata separately. This study aims to develop a solution that is independent of specific devices for extracting metadata from dash- board camera footage. Optical Character Recognition (OCR) technology presents a promising avenue due to its hardware-agnostic capabilities. The research investigates the application of pre-trained OCR models, including Tesseract, KerasOCR, and EasyOCR, to extract overlaid data from images and videos cap- tured by dashboard cameras, while also assessing the limitations and potential failure scenarios of these models. The findings indicate that a combination of traditional OCR algorithms and preprocessing techniques achieves an average Character Recognition Rate (CRR) of 50.6% and a Character Error Rate (CER) of 49.4%. To address the shortcomings of traditional OCR methods, a novel Transformer-based Optical Character Recognition (TrOCR) approach is pro- posed. Extensive training and validation of the TrOCR model, utilizing a mixed 1dataset of real and synthetic footage, significantly enhance character recognition accuracy to 84%, with a corresponding reduction in character error rate to 16%. Furthermore, the incorporation of post-processing techniques results in excep- tional accuracy of 97% and a negligible character error rate of only 0.08%. This research introduces a device-independent approach to extract metadata from dashboard camera footage, demonstrating the efficacy of the TrOCR method and underscoring the benefits of post-processing techniques for live data extraction.

Article activity feed