Accurate prediction of sports-related injuries is essential for optimizing athlete health and performance. This study evaluated machine learning (ML) models for injury risk in 300 male professional football players (ages 18-28) monitored over two competitive seasons (2021-2022). Injuries were defined as musculoskeletal conditions causing at least one missed training session or match, confirmed via ICD-10 diagnoses. Daily data on training workload, recovery, wellness, heart-rate variability, cumulative minutes played, and injury history were collected. Features were preprocessed with normalization, one-hot encoding, and selected via LASSO regression and recursive feature elimination. Missing data (< 3%) were imputed using multiple imputation by chained equations, and class imbalance was addressed with SMOTE and weighting. Logistic regression, decision tree, and random forest models were trained using 10-fold cross-validation and evaluated for accuracy, precision, recall, F1-score, and AUC. Random forests outperformed other models, achieving accuracy 85.6 ± 2.1%, precision 82.1 ± 1.9%, recall 80.3 ± 2.4%, F1-score 81.2 ± 2.2%, and AUC 90.5 ± 1.6%. Explainable AI techniques, including SHAP and LIME, identified prior injury, training intensity, and recovery time as the strongest predictors, enabling individualized risk assessment. These findings demonstrate that ensemble ML methods provide robust, interpretable, and actionable insights for injury prevention, supporting data-driven strategies to optimize training and reduce injury incidence. Future work should expand validation across multiple sports and integrate additional physiological and genetic factors to enhance predictive accuracy and generalizability.
扫码关注我们
求助内容:
应助结果提醒方式:
