{"title":"Predicting the value of football players: machine learning techniques and sensitivity analysis based on FIFA and real-world statistical datasets","authors":"Qijie Shen","doi":"10.1007/s10489-024-06189-0","DOIUrl":null,"url":null,"abstract":"<div><p>The study focuses on applying machine learning methodologies to football player data for predicting player market values in the dynamic football market. Player datasets are rich, encompassing performance metrics, physiological attributes, and contextual variables. Machine learning models, including both traditional and advanced methods, effectively extract insights from complex data to estimate player market values. Addressing challenges like overfitting and computational complexity involves applying regularization, feature engineering, and interpretability tools to manage high-dimensional data and improve predictive accuracy. In this study sensitivity of selected models (Support Vector Regression (SVR), Random Forest Regression (RFR), Extreme Gradient Boosting (XGB), and Categorical Boosting (CAT)) models to extracted data from FIFA 19 and Real-world Statistical Datasets evaluated by Shapley Additive Explanations (SHAP) and the 20 most relevant features selected in the ranking of SHAP for each regression model. Then, models optimized with two meta-heuristic algorithms demonstrated their performance in predicting the market values of players. Dempster-Shafer Theory (DST) was utilized to develop an ensemble of models to overcome overfitting problems, and Fourier amplitude sensitivity testing (FAST) gave insight for future data extractions. The analysis of market values for players revealed significant model performance variations. XGSC hybrid model demonstrated exceptional precision with a minimal error of 1.7 million dollars (10% of average measured value), followed by RSCX_SC with misestimations of 2 million dollars (13.3% of average measured value). Extracted results suggested that models, especially ensemble form, offer reliable accuracy for club managers and stakeholders, aiding in strategic player selection based on previous performance. This approach proves particularly beneficial for optimizing player salaries, especially when considering a prominent team with market values above average.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-06189-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The study focuses on applying machine learning methodologies to football player data for predicting player market values in the dynamic football market. Player datasets are rich, encompassing performance metrics, physiological attributes, and contextual variables. Machine learning models, including both traditional and advanced methods, effectively extract insights from complex data to estimate player market values. Addressing challenges like overfitting and computational complexity involves applying regularization, feature engineering, and interpretability tools to manage high-dimensional data and improve predictive accuracy. In this study sensitivity of selected models (Support Vector Regression (SVR), Random Forest Regression (RFR), Extreme Gradient Boosting (XGB), and Categorical Boosting (CAT)) models to extracted data from FIFA 19 and Real-world Statistical Datasets evaluated by Shapley Additive Explanations (SHAP) and the 20 most relevant features selected in the ranking of SHAP for each regression model. Then, models optimized with two meta-heuristic algorithms demonstrated their performance in predicting the market values of players. Dempster-Shafer Theory (DST) was utilized to develop an ensemble of models to overcome overfitting problems, and Fourier amplitude sensitivity testing (FAST) gave insight for future data extractions. The analysis of market values for players revealed significant model performance variations. XGSC hybrid model demonstrated exceptional precision with a minimal error of 1.7 million dollars (10% of average measured value), followed by RSCX_SC with misestimations of 2 million dollars (13.3% of average measured value). Extracted results suggested that models, especially ensemble form, offer reliable accuracy for club managers and stakeholders, aiding in strategic player selection based on previous performance. This approach proves particularly beneficial for optimizing player salaries, especially when considering a prominent team with market values above average.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.