Hamstring Strain Injury Risk in Soccer: An Exploratory, Hypothesis-Generating Prediction Model
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Hamstring strain injuries (HSI) are common in soccer and remain challenging to predict, as traditional risk factors often fail to capture the multifactorial nature of injury susceptibility. This prospective cohort study aimed to develop and internally validate a machine learning-assisted logistic regression model for predicting hamstring injuries in amateur soccer players using preseason clinical and strength-related variables. A total of 120 male players were followed for one competitive season (30 weeks). Baseline predictors included age, body mass index, previous injury, and bilateral isometric hip and knee strength measured via handheld dynamometry. Twenty initial predictors were reduced to ten through symmetrical uncertainty feature ranking before training a logistic regression model with elastic-net regularization (training set: n = 83; test set: n = 37) using nested four-fold cross-validation. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), calibration metrics, and confusion matrices. During follow-up, 21 players sustained at least one HSI (32 events; 28% reinjuries), yielding an events-per-variable ratio of 2.1, below ideal thresholds and suggesting possible overfitting. On the independent test set, the model achieved an accuracy of 64.9%, AUC of 0.68 (95% CI 0.52–0.84), calibration slope of 0.85, and intercept of −0.12, with a sensitivity of 60% and specificity of 65.6%. Dominant-leg hip abduction strength was the only statistically significant predictor (OR = 0.82, 95% CI 0.70–0.96), while permutation importance analyses identified previous hamstring injury as the most stable contributor to model performance. Neither age nor hamstring isometric strength demonstrated predictive value. Although model discrimination was moderate and calibration indicated mild overfitting, findings reinforce the prognostic relevance of prior injury and suggest that reduced hip abduction strength may serve as an emerging candidate marker. This study, classified as a TRIPOD Category 2 model (development without external validation), provides preliminary, hypothesis-generating evidence supporting the use of multivariate strength and history-based predictors in future, larger-scale injury prediction research.