Driver Injury Prediction and Factor Analysis in Passenger Vehicle-to-Passenger Vehicle Collision Accidents Using Explainable Machine Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Vehicle accidents, particularly PV-PV collisions, result in significant property damage and driver injuries, causing substantial economic losses and health risks. Most existing studies focus on macro-level predictions, such as accident frequency, but lack detailed collision-level analysis, which limits the precision of severity prediction. This study investigates various accident-related factors, including environmental conditions, vehicle attributes, driver characteristics, pre-crash scenarios, and collision dynamics. Data from NHTSA’s CRSS and FARS datasets were integrated and balanced using random over-sampling and under-sampling techniques to address severity-level data imbalances. The mRMR algorithm was employed for feature selection to minimize redundancy and identify key features. Five advanced machine learning models were evaluated for severity prediction, with XGBoost achieving the best performance: 84.9% accuracy, 84.85% precision, 84.90% recall, and an F1-score of 84.87%. SHAP analysis was utilized to interpret the model and conduct a comprehensive analysis of accident features, including their importance, dependencies, and combined effects on severity prediction. This study achieved high accuracy in predicting accident severity across all levels in PV-PV collisions. Moreover, by integrating the SHAP model interpretation method, we conducted detailed feature analysis at global, local, and individual case levels, thereby filling the gap in PV-PV accident severity prediction and feature analysis.