A time-sequenced approach to machine learning prognostic modelling with implementation on running-related injury prediction

Han Wu
Katherine Brooke-Wavell
Michael R. Barnes
Zainab Awan
Sarabjit Mastana
Sam Allen
Richard C. Blagrove

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

The use of machine learning (ML) methods in medical prognostic modelling is gaining popularity, yet all currently available source models were designed from a general mathematics perspective. These models encounter limitations when embedding discipline-specific information, which restricts model practical interpretability. This study aimed to introduce two novel prognostic ML source models designed with an area-specific approach by testing their performance against commonly used ML methods, and explored their interpretability.

Methods

Measurements associated with multidisciplinary risk factors (genetics, history, neuromuscular capacity, biomechanics, body composition, nutrition, training) were collected on competitive endurance runners, who were subsequently monitored weekly over 12 months for running-related injuries (RRIs). Data was fitted with commonly used ML methods and the novel models using a stratified 10-fold cross validation framework for performance comparisons. Interpretable feature interactions were tested for statistical significance, and extracted feature importance scores were tested for correlation with Shapley Additive Explanation (SHAP) values.

Results

6,181 valid weekly samples were collected from 142 competitive endurance runners. The novel methods’ performances (AUC 0·736-0·753, Accuracy 0·822-0·849, Sensitivity 0·376-0·455, Specificity 0·859-0·896) matched those of commonly used ML methods (AUC 0·649-0·784, Accuracy 0·662-0·857, Sensitivity 0·337-0·568, Specificity 0·671-0·904). Pairwise feature interactions revealed stable patterns (p<0·001). Method-specific computationally efficient feature importance scores moderately correlated with SHAP values (r=0·12-0·72), showing increases as model parameterization increased.

Conclusion

The novel methods showed comparable performance and better interpretability against common ML methods. Interpretability improved with increasing parameterization, suggesting performance may further improve with larger datasets and more features. Future research should perform higher quality validations using larger datasets before these methods can be widely adopted in prognostic modelling research.

Funding

This research is supported by the Alan Turing Institute Enrichment Scheme and the China Scholarship Council.

Version published to 10.1101/2025.05.07.25327162v3 on medRxiv
May 27, 2025
Version published to 10.1101/2025.05.07.25327162v2 on medRxiv
May 9, 2025
Version published to 10.1101/2025.05.07.25327162v1 on medRxiv
May 8, 2025

Integrative Machine Learning Approach to Risk Prediction for Dementia and Alzheimer’s Disease

This article has 2 authors:
1. Amos Stern
2. Michal Linial
This article has no evaluationsLatest version Jun 2, 2025
Auditor Models to Suppress Poor AI Predictions Can Improve Human-AI Collaborative Performance

This article has 8 authors:
1. Katherine E. Brown
2. Jesse O. Wrenn
3. Nicholas J. Jackson
4. Michael R. Cauley
5. Benjamin Collins
6. Laurie Lovett Novak
7. Bradley A. Malin
8. Jessica S. Ancker
This article has no evaluationsLatest version Jun 24, 2025
Physician Workforce and Population Mortality: A Globally Validated Age-Structured Modeling Approach

This article has 1 author:
1. Youngmin Oh
This article has no evaluationsLatest version Jun 20, 2025

Listed in

Abstract

Background

Methods

Results

Conclusion

Funding

Article activity feed

Related articles

Integrative Machine Learning Approach to Risk Prediction for Dementia and Alzheimer’s Disease

Auditor Models to Suppress Poor AI Predictions Can Improve Human-AI Collaborative Performance

Physician Workforce and Population Mortality: A Globally Validated Age-Structured Modeling Approach