Agricultural Injury Severity Prediction Using Integrated Data-Driven Analysis: Global Versus Local Explainability Using SHAP
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Despite the agricultural sector’s consistently high injury rates, formal reporting is often limited, leading to sparse national datasets that hinder effective safety interventions. To address this, our study introduces a comprehensive framework leveraging advanced ensemble machine learning (ML) models to predict and interpret the severity of agricultural injuries. We use a unique, manually curated dataset of over 2,400 agricultural incidents from AgInjuryNews, a public repository of news reports detailing incidents across the United States. We evaluated six ensemble models, including Gradient Boosting, XGBoost, LightGBM, AdaBoost, HistGradientBoosting, and Random Forest, for their accuracy in classifying injury outcomes as fatal or non-fatal. A key contribution of our work is the novel integration of explainable artificial intelligence (XAI), specifically SHapley Additive exPlanations (SHAP), to overcome the “black-box” nature of complex ensemble models. The models demonstrated strong predictive performance, with most achieving an accuracy of approximately 0.71 and an F1-score of 0.81. Through global SHAP analysis, we identified key factors influencing injury severity across the dataset, such as the presence of helmet use, victim age, and the type of injury agent. Additionally, our application of local SHAP analysis revealed how specific variables like location and victim’s role can have varying impacts depending on the context of the incident. These findings provide actionable, context-aware insights for developing targeted policy and safety interventions for a range of stakeholders, from first responders to policymakers, offering a powerful tool for a more proactive approach to agricultural safety.