Random Forest Screening and Association Rule Mining for Crash Pattern Discovery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present a statistically principled pipeline for pattern discovery in crash data and apply it to fatal heavy vehicle crashes in rural Interstate work zones using NHTSA FARS (2020 to 2023). The two stage workflow first screens predictors of driver fatality at the scene via random forest variable importance, and second, mines interpretable association rules (Apriori) on the retained features. Rules are learned under minimum support = 0.03 and confidence = 0.60, and ranked by lift; the 16 highest lift rules are analyzed.The resulting patterns show consistent strength (lift = 3.18 to 3.85; mean confidence = 0.64; mean support = 3.46%) and most often combine disabling vehicle deformation, non collision first harmful events (run off road or fixed object), and roadside locations; lack of airbag deployment appears frequently among antecedents, while speed related items are rare. We comment on rule redundancy. The contribution is an end to end, reproducible applied statistics workflow that includes data preprocessing templates for FARS, integration of screening and rule mining, and interpretation guidelines, yielding transparent surrogates that complement black box severity models. Although the case study concerns work zone heavy vehicle crashes, the methodology is general and transferable to other domains requiring interpretable pattern mining. We also summarize the paper’s main shortcomings to situate the results. These caveats are paired with suggested extensions so readers can assess scope and applicability.

Article activity feed