Artificial Intelligence in the Selection of Top-Performing Athletes for Team Sports: A Proof-of-Concept Predictive Modeling Study
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate and scalable evaluation in team sports remains challenging, motivating the use of artificial‑intelligence models to support objective athlete assessment. This study develops and validates a predictive model capable of calibrated, operationally tested classification of team‑sport athletes as high‑ or low‑performance using a synthetic, literature‑informed dataset (n = 400). Labels were defined a priori by simulated group membership, while a composite score was retained for post‑hoc checks to avoid circularity. LightGBM served as the primary classifier and was contrasted with Logistic Regression (L2), Random Forest, and XGBoost. Performance was evaluated with stratified, nested 5×5 cross-validation. Calibrated, deployment-ready probabilities were obtained by selecting a monotonic mapping (Platt or isotonic) in the inner CV, with two pre-specified operating points: screening (recall-oriented; precision ≥0.70) and shortlisting (F1-optimized). Under this protocol, the model achieved 89.5% accuracy and ROC‑AUC 0.93. SHAP analyses indicated VO₂max, decision latency, maximal strength, and reaction time as leading contributors with domain‑consistent directions. These results represent a proof-of-concept and an upper bound on synthetic data and require external validation. Taken together, the pipeline offers a transparent, reproducible, and ethically neutral template for athlete selection and targeted training in team sports; calibration and pre‑specified thresholds align the approach with real‑world decision‑making.