Customizing Large Language Models for Reliable and Interpretable Traffic Crash Prediction and Safety Interventions

Hao Frank Yang
Yang Zhao
Pu Wang
Yibo Zhao
Hongru Du

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Predicting crash events is crucial for understanding crash distributions and their contributing factors, thereby enabling the design of proactive traffic safety policy interventions. However, existing methods struggle to interpret the complex interplay among various sources of traffic crash data, including numeric characteristics, textual reports, crash imagery, environmental conditions, and driver behavior records. As a result, they often fail to capture the rich semantic information and intricate interrelationships embedded in these diverse data sources, limiting their ability to identify critical crash risk factors. In this research, we propose TrafficSafe, a framework that adapts Large Language Models (LLMs) to reframe crash prediction and feature attribution as text-based reasoning. A multi-modal crash dataset including 58,903 real-world reports together with belonged infrastructure, environmental, driver, and vehicle information is collected and textualized into TrafficSafe Event dataset (totaling 12.74 million words). By customizing and fine-tuning state-of-the-art LLMs on this dataset, the proposed TrafficSafe LLM achieves a 42% average improvement in F1-score over baselines across multiple crash prediction tasks, particularly for severe crashes. To interpret these predictions and uncover contributing factors, we introduce TrafficSafe Attribution, a sentence-level feature attribution framework enabling conditional risk analysis. Findings show that alcohol-impaired driving is the leading factor in severe crashes, with aggressive and impairment-related behaviors having nearly twice the contribution for severe crashes compared to other driver behaviors. In addition, the co-occurrence of crash-contributing factors, such as alcohol-impaired driving, work zones, improper driving behaviors and others can significantly elevate risk levels. Furthermore, TrafficSafe Attribution highlights pivotal features during model training, guiding strategic crash data collection for iterative performance improvements. The proposed TrafficSafe offers a transformative leap in traffic safety research based on foundation models, providing a blueprint for translating advanced artificial intelligence technologies into responsible, actionable, and life-saving outcomes. It is now reshaping how traffic researchers and policymakers approach the road safety.

Version published to 10.21203/rs.3.rs-5947574/v1 on Research Square
Apr 29, 2025

A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

This article has 1 author:
1. Snehil Shrivastava
This article has no evaluationsLatest version Jun 16, 2025
A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

This article has 1 author:
1. Snehil Shrivastava
This article has no evaluationsLatest version Jun 16, 2025
TraceLM: Temporal Root-Cause Analysis with Contextual Embedding Language Models

This article has 1 author:
1. Bingxin Zhu
This article has no evaluationsLatest version May 26, 2025

Listed in

Abstract

Article activity feed

Related articles

A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

TraceLM: Temporal Root-Cause Analysis with Contextual Embedding Language Models