Ensemble Machine Learning for Predicting TBM Penetration Rate with Limited Geotechnical Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study evaluated the predictive performance of Random Forest, Bagged Trees, Support Vector Machines (SVM), and Least Squares Boosting (LSBoost) for estimating Tunnel Boring Machine (TBM) penetration rate (ROP). While all models achieved acceptable accuracy, LSBoost outperformed the others, showing the highest correlation (R = 0.965) and coefficient of determination (R² = 0.909), along with the lowest RMSE and MAE. Its performance remained robust after Z-score normalization, highlighting its ability to capture nonlinear parameter interactions and generalize well on limited geotechnical datasets. Random Forest and Bagged Trees showed similar performance, with Bagged Trees only slightly improved by normalization. SVM performed less effectively, indicating limited capacity to model complex TBM penetration behavior. Feature importance and SHAP analyses identified discontinuity spacing (DPW) and uniaxial compressive strength (UCS) as the primary controlling factors, while brittleness index (BI) was more influential within the SVM model. Agreement between Jacobian-based derivative analyses and SHAP results confirmed both mathematical sensitivity and engineering interpretability. Overall, TBM penetration prediction is a multivariate and inherently nonlinear problem. LSBoost provides reliable and high-accuracy predictions even under data-constrained conditions. The combination of SHAP- and PDP-based feature importance analyses enhances interpretability, supporting engineering decision-making in TBM design and operation. These findings emphasize the applicability of machine learning approaches for accurate, interpretable, and robust TBM performance prediction.