Estimating Indirect Accident Cost Using a Two-Tiered Machine Learning Algorithm for the Construction Industry

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurately estimating total accident costs is essential for managing construction safety budgets. While direct costs are well-documented, indirect costs—such as productivity loss, material damage, and legal expenses—are difficult to predict and often overlooked. Traditional ratio-based methods lack accuracy due to variability across projects and accident types. This study introduces a two-tiered machine learning framework for real-time indirect cost estimation. In the first tier, classification models (decision tree, random forest, k-nearest neighbor, and XGBoost) predict total cost categories; in the second, regression models (decision tree, random forest, gradient boosting, and light-gradient boosting machine) estimate indirect costs. Using a dataset of 1036 construction accidents collected over two years, the model achieved accuracies above 87% in classification and an R2 of 0.95 with a training MSE of 0.21 in regression. Compared to conventional statistical and single-step models, it demonstrated superior predictive performance, reducing average deviations to $362.63 and sometimes achieving zero deviation. This framework enables more precise, real-time estimation of hidden costs, promoting better safety investment, reduced financial risk, and adaptive learning through retraining. When integrated with a national accident cost database, it supports ongoing improvement and informed risk management for construction stakeholders.

Article activity feed