From Prediction to Prevention: Identifying Actionable Crash Factors Through ML and Narrative-Based Sensitivity Testing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Traffic safety remains a critical area in transportation research due to its direct impact on public health. Reducing crash-related injuries and fatalities requires identifying key factors influencing injury severity. This study evaluates the predictive performance of three machine learning (ML) models; Random Forest (RF), XGBoost, and AdaBoost using a combination of structured crash data, highway characteristics, and unstructured crash narratives from police reports in Kentucky (2019–2023). RF was first applied for feature selection, identifying the top 100 predictors from over 350 variables. Two natural language processing (NLP) techniques; Term Frequency–Inverse Document Frequency (TF-IDF) and Word2Vec were then used to extract meaning from the crash narratives. These textual features were integrated with structured data to develop individual-level injury severity prediction models. XGBoost combined with TF-IDF yielded the highest predictive performance. The model was trained on 32,000 labeled records and then applied to a larger dataset of 67,000 crashes, many of which lacked injury severity data. The predicted results served as a baseline (Rb) for simulation-based sensitivity analysis. In this step, key input variables were systematically modified, and new predictions (R1, R2, etc.) were compared to assess how severity distributions shifted across four classes: fatal, major, minor, and possible injury. The analysis revealed that occupant entrapment, restraint use, not under proper control, ejection, speeding, and helmet use significantly influenced injury severity shifting. Changes in these factors often shifted outcomes from severe to less severe categories. These findings offer actionable insights for safety-focused policy and intervention strategies.

Article activity feed