Abstract The wage of a football player is a function of numerous aspects such as the player’s skills, performance in the previous seasons, age, trajectory of improvement, personality, and more. Based on these aspects, salaries of football players are determined through negotiation between the team management and the agents. In this study we propose an objective quantitative method to determine football players’ wages based on their skills. The method is based on the application of pattern recognition algorithms to performance (e.g., scoring), behavior (e.g., aggression), and abilities (e.g., acceleration) data of football players. Experimental results using data from 6,082 players show that the Pearson correlation between the predicted and actual salary of the players is ~0.77 (p < .001). The proposed method can be used as an assistive technology when negotiating players salaries, as well as for performing quantitative analysis of links between the salary and the performance of football players. The method is based on the performance and skills of the players, but does not take into account aspects that are not related directly to the game such as the popularity of the player among fans, predicted merchandise sales, etc, which are also factors of high impact on the salary, especially in the case of the team lead players and superstars. Analysis of player salaries in eight European football leagues show that the skills that mostly affect the salary are largely consistent across leagues, but some differences exist. Analysis of underpaid and overpaid players shows that overpaid players tend to be stronger, but are inferior in their reactions, vision, acceleration, agility, and balance compared to underpaid football players.
{"title":"Computational Estimation of Football Player Wages","authors":"L. Yaldo, L. Shamir","doi":"10.1515/ijcss-2017-0002","DOIUrl":"https://doi.org/10.1515/ijcss-2017-0002","url":null,"abstract":"Abstract The wage of a football player is a function of numerous aspects such as the player’s skills, performance in the previous seasons, age, trajectory of improvement, personality, and more. Based on these aspects, salaries of football players are determined through negotiation between the team management and the agents. In this study we propose an objective quantitative method to determine football players’ wages based on their skills. The method is based on the application of pattern recognition algorithms to performance (e.g., scoring), behavior (e.g., aggression), and abilities (e.g., acceleration) data of football players. Experimental results using data from 6,082 players show that the Pearson correlation between the predicted and actual salary of the players is ~0.77 (p < .001). The proposed method can be used as an assistive technology when negotiating players salaries, as well as for performing quantitative analysis of links between the salary and the performance of football players. The method is based on the performance and skills of the players, but does not take into account aspects that are not related directly to the game such as the popularity of the player among fans, predicted merchandise sales, etc, which are also factors of high impact on the salary, especially in the case of the team lead players and superstars. Analysis of player salaries in eight European football leagues show that the skills that mostly affect the salary are largely consistent across leagues, but some differences exist. Analysis of underpaid and overpaid players shows that overpaid players tend to be stronger, but are inferior in their reactions, vision, acceleration, agility, and balance compared to underpaid football players.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"16 1","pages":"18 - 38"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijcss-2017-0002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43298732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Self-Organizing Maps (SOMs) are steadily more integrated as data-analysis tools in human movement and sport science. One of the issues limiting researchers’ confidence in their applications and conclusions concerns the (arbitrary) selection of training parameters, their effect on the quality of the SOM and the sensitivity of any subsequent analyses. In this paper, we demonstrate how quality and sensitivity may be examined to increase the validity of SOM-based data-analysis. For this purpose, we use two related data sets where the research question concerns coordination variability in a volleyball spike. SOMs are an attractive tool for analysing this problem because of their ability to reduce the highdimensional time series to a two-dimensional problem while preserving the topological, non-linear relations in the original data. In a first step, we systematically search the SOM parameter space for a set of options that produces significantly lower continuity, accuracy and combined map errors and we discuss the sensitivity of SOM-based analyses of coordination variability to changes in training parameters. In a second step, we further investigate the effect of using different numbers of trials and variables on the SOM quality and sensitivity. These sensitivity analyses are able to validate the conclusions from statistical tests. Using this type of analysis can guide researchers to select SOM parameters that optimally represent their data and to examine how they affect the subsequent analyses. This may also enforce confidence in any conclusions that are drawn from studies using SOMs and enhance their integration in human movement and sport science.
{"title":"Issues in Using Self-Organizing Maps in Human Movement and Sport Science","authors":"B. Serrien, Maarten Goossens, J. Baeyens","doi":"10.1515/ijcss-2017-0001","DOIUrl":"https://doi.org/10.1515/ijcss-2017-0001","url":null,"abstract":"Abstract Self-Organizing Maps (SOMs) are steadily more integrated as data-analysis tools in human movement and sport science. One of the issues limiting researchers’ confidence in their applications and conclusions concerns the (arbitrary) selection of training parameters, their effect on the quality of the SOM and the sensitivity of any subsequent analyses. In this paper, we demonstrate how quality and sensitivity may be examined to increase the validity of SOM-based data-analysis. For this purpose, we use two related data sets where the research question concerns coordination variability in a volleyball spike. SOMs are an attractive tool for analysing this problem because of their ability to reduce the highdimensional time series to a two-dimensional problem while preserving the topological, non-linear relations in the original data. In a first step, we systematically search the SOM parameter space for a set of options that produces significantly lower continuity, accuracy and combined map errors and we discuss the sensitivity of SOM-based analyses of coordination variability to changes in training parameters. In a second step, we further investigate the effect of using different numbers of trials and variables on the SOM quality and sensitivity. These sensitivity analyses are able to validate the conclusions from statistical tests. Using this type of analysis can guide researchers to select SOM parameters that optimally represent their data and to examine how they affect the subsequent analyses. This may also enforce confidence in any conclusions that are drawn from studies using SOMs and enhance their integration in human movement and sport science.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"16 1","pages":"1 - 17"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijcss-2017-0001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41868503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Ordinal regression models are frequently used in academic literature to model outcomes of soccer matches, and seem to be preferred over nominal models. One reason is that, obviously, there is a natural hierarchy of outcomes, with victory being preferred to a draw and a draw being preferred to a loss. However, the often used ordinal models have an assumption of proportional odds: the influence of an independent variable on the log odds is the same for each outcome. This paper illustrates how ordinal regression models therefore fail to fully utilize independent variables that contain information about the likelihood of matches ending in a draw. However, in practice, this flaw does not seem to have a substantial effect on the predictive accuracy of an ordered logit regression model when compared to a multinomial logistic regression model.
{"title":"Ordinal versus nominal regression models and the problem of correctly predicting draws in soccer","authors":"L. M. Hvattum","doi":"10.1515/ijcss-2017-0004","DOIUrl":"https://doi.org/10.1515/ijcss-2017-0004","url":null,"abstract":"Abstract Ordinal regression models are frequently used in academic literature to model outcomes of soccer matches, and seem to be preferred over nominal models. One reason is that, obviously, there is a natural hierarchy of outcomes, with victory being preferred to a draw and a draw being preferred to a loss. However, the often used ordinal models have an assumption of proportional odds: the influence of an independent variable on the log odds is the same for each outcome. This paper illustrates how ordinal regression models therefore fail to fully utilize independent variables that contain information about the likelihood of matches ending in a draw. However, in practice, this flaw does not seem to have a substantial effect on the predictive accuracy of an ordered logit regression model when compared to a multinomial logistic regression model.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"16 1","pages":"50 - 64"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijcss-2017-0004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44569803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The intention of Key Performance Indicators (KPI) is to map complex system-behaviour to single values for scaling, rating and ranking systems or system components. Very often, however, this mapping only reduces important information about tactical behaviour or playing dynamics without replacing it by useful ones. The presented approach tries to bridge the gap between complex dynamics and numerical indicators in the case of offensive effectiveness in soccer in two steps. First, a model is developed which visualises offensive actions in a process-oriented way by using information units to represent offensive performance – i.e. Key Performance Indicators. Second, this model is organised in relation to time intervals, which enables to measure the effectiveness for a whole half-time as well as for arbitrary intervals of any desired lengths. This contribution is meant as an introduction to a new modelling idea, where examples are calculated as case studies to demonstrate how it works. Therefore, only two games have been exemplarily analysed yet: The first one, which is used to demonstrate the method, is an example for similar quantitative indicators but different dynamic behaviour. The last one is used to demonstrate the results in the case of teams with extreme different strengths.
{"title":"A Pilot Study on Offensive Success in Soccer Based on Space and Ball Control – Key Performance Indicators and Key to Understand Game Dynamics","authors":"J. Perl, D. Memmert","doi":"10.1515/ijcss-2017-0005","DOIUrl":"https://doi.org/10.1515/ijcss-2017-0005","url":null,"abstract":"Abstract The intention of Key Performance Indicators (KPI) is to map complex system-behaviour to single values for scaling, rating and ranking systems or system components. Very often, however, this mapping only reduces important information about tactical behaviour or playing dynamics without replacing it by useful ones. The presented approach tries to bridge the gap between complex dynamics and numerical indicators in the case of offensive effectiveness in soccer in two steps. First, a model is developed which visualises offensive actions in a process-oriented way by using information units to represent offensive performance – i.e. Key Performance Indicators. Second, this model is organised in relation to time intervals, which enables to measure the effectiveness for a whole half-time as well as for arbitrary intervals of any desired lengths. This contribution is meant as an introduction to a new modelling idea, where examples are calculated as case studies to demonstrate how it works. Therefore, only two games have been exemplarily analysed yet: The first one, which is used to demonstrate the method, is an example for similar quantitative indicators but different dynamic behaviour. The last one is used to demonstrate the results in the case of teams with extreme different strengths.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"16 1","pages":"65 - 75"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijcss-2017-0005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41414602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The aim of this study was to analyse the general properties of the network of elite football teams that participated in UEFA Champions League 2015–2016. Analysis of variance of the general network measures between performances in competition was made. Moreover, the association between performance variables (goals, shots, and percentage of ball possession) and general network measures also was tested. The best sixteen teams that participated in UEFA Champions League 2015–2016 were analysed in a total of 109 official matches. Statistically significant differences between maximum stages in competition were found in total links (p = 0.003; ES = 0.087), network density (p = 0.003; ES = 0.088), and clustering coefficient (p = 0.007; ES = 0.078). Total links (r = 0.439; p = 0.001), network density (r = 0.433; p = 0.001) and clustering coefficient (r = 0.367; p = 0.001) had a moderate positive correlations with percentage of ball possession. This study revealed that teams that achieved the quarterfinals and finals had greater values of general network measures than the remaining teams, thus suggesting that higher values of homogeneity in network process may improve the success of the teams. Moderate correlations were found between ball possession and the general network measures suggesting that teams with more capacity to perform longer passing sequences may involve more players in a more homogeneity manner.
{"title":"Network structure of UEFA Champions League teams: association with classical notational variables and variance between different levels of success","authors":"F. Clemente, F. Martins","doi":"10.1515/ijcss-2017-0003","DOIUrl":"https://doi.org/10.1515/ijcss-2017-0003","url":null,"abstract":"Abstract The aim of this study was to analyse the general properties of the network of elite football teams that participated in UEFA Champions League 2015–2016. Analysis of variance of the general network measures between performances in competition was made. Moreover, the association between performance variables (goals, shots, and percentage of ball possession) and general network measures also was tested. The best sixteen teams that participated in UEFA Champions League 2015–2016 were analysed in a total of 109 official matches. Statistically significant differences between maximum stages in competition were found in total links (p = 0.003; ES = 0.087), network density (p = 0.003; ES = 0.088), and clustering coefficient (p = 0.007; ES = 0.078). Total links (r = 0.439; p = 0.001), network density (r = 0.433; p = 0.001) and clustering coefficient (r = 0.367; p = 0.001) had a moderate positive correlations with percentage of ball possession. This study revealed that teams that achieved the quarterfinals and finals had greater values of general network measures than the remaining teams, thus suggesting that higher values of homogeneity in network process may improve the success of the teams. Moderate correlations were found between ball possession and the general network measures suggesting that teams with more capacity to perform longer passing sequences may involve more players in a more homogeneity manner.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"16 1","pages":"39 - 50"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijcss-2017-0003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47108443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Carey, K-L. Ong, R. Whiteley, K. Crossley, J. Crow, M. Morris
Abstract To investigate whether training load monitoring data could be used to predict injuries in elite Australian football players, data were collected from athletes over 3 seasons at an Australian football club. Loads were quantified using GPS devices, accelerometers and player perceived exertion ratings. Absolute and relative training load metrics were calculated for each player each day. Injury prediction models (regularised logistic regression, generalised estimating equations, random forests and support vector machines) were built for non-contact, non-contact time-loss and hamstring specific injuries using the first two seasons of data. Injury predictions were then generated for the third season and evaluated using the area under the receiver operator characteristic (AUC). Predictive performance was only marginally better than chance for models of non-contact and non-contact time-loss injuries (AUC<0.65). The best performing model was a multivariate logistic regression for hamstring injuries (best AUC=0.76). Injury prediction models built using training load data from a single club showed poor ability to predict injuries when tested on previously unseen data, suggesting limited application as a daily decision tool for practitioners. Focusing the modelling approach on specific injury types and increasing the amount of training observations may improve predictive models for injury prevention
{"title":"Predictive Modelling of Training Loads and Injury in Australian Football","authors":"D. Carey, K-L. Ong, R. Whiteley, K. Crossley, J. Crow, M. Morris","doi":"10.2478/ijcss-2018-0002","DOIUrl":"https://doi.org/10.2478/ijcss-2018-0002","url":null,"abstract":"Abstract To investigate whether training load monitoring data could be used to predict injuries in elite Australian football players, data were collected from athletes over 3 seasons at an Australian football club. Loads were quantified using GPS devices, accelerometers and player perceived exertion ratings. Absolute and relative training load metrics were calculated for each player each day. Injury prediction models (regularised logistic regression, generalised estimating equations, random forests and support vector machines) were built for non-contact, non-contact time-loss and hamstring specific injuries using the first two seasons of data. Injury predictions were then generated for the third season and evaluated using the area under the receiver operator characteristic (AUC). Predictive performance was only marginally better than chance for models of non-contact and non-contact time-loss injuries (AUC<0.65). The best performing model was a multivariate logistic regression for hamstring injuries (best AUC=0.76). Injury prediction models built using training load data from a single club showed poor ability to predict injuries when tested on previously unseen data, suggesting limited application as a daily decision tool for practitioners. Focusing the modelling approach on specific injury types and increasing the amount of training observations may improve predictive models for injury prevention","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"17 1","pages":"49 - 66"},"PeriodicalIF":0.0,"publicationDate":"2017-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2478/ijcss-2018-0002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42355429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Carey, K-L. Ong, M. Morris, J. Crow, K. Crossley
Abstract The ability of machine learning techniques to predict athlete ratings of perceived exertion (RPE) was investigated in professional Australian football players. RPE is commonly used to quantifying internal training loads and manage injury risk in team sports. Data from global positioning systems, heart-rate monitors, accelerometers and wellness questionnaires were recorded for each training session (n=3398) from 45 professional Australian football players across a full season. A variety of modelling approaches were considered to investigate the ability of objective data to predict RPE. Models were compared using nested cross validation and root mean square error (RMSE) on RPE predictions. A random forest model using player normalised running and heart rate variables provided the most accurate predictions (RMSE ± SD = 0.96 ± 0.08 au). A simplification of the model using only total distance, distance covered at speeds between 18-24 km·h−1, and the product of total distance and mean speed provided similarly accurate predictions (RMSE ± SD = 1.09 ± 0.05 au), suggesting that running distances and speeds are the strongest predictors of RPE in Australian football players. The ability of non-linear machine learning models to accurately predict athlete RPE has applications in live player monitoring and training load planning.
{"title":"Predicting ratings of perceived exertion in Australian football players: methods for live estimation","authors":"D. Carey, K-L. Ong, M. Morris, J. Crow, K. Crossley","doi":"10.1515/ijcss-2016-0005","DOIUrl":"https://doi.org/10.1515/ijcss-2016-0005","url":null,"abstract":"Abstract The ability of machine learning techniques to predict athlete ratings of perceived exertion (RPE) was investigated in professional Australian football players. RPE is commonly used to quantifying internal training loads and manage injury risk in team sports. Data from global positioning systems, heart-rate monitors, accelerometers and wellness questionnaires were recorded for each training session (n=3398) from 45 professional Australian football players across a full season. A variety of modelling approaches were considered to investigate the ability of objective data to predict RPE. Models were compared using nested cross validation and root mean square error (RMSE) on RPE predictions. A random forest model using player normalised running and heart rate variables provided the most accurate predictions (RMSE ± SD = 0.96 ± 0.08 au). A simplification of the model using only total distance, distance covered at speeds between 18-24 km·h−1, and the product of total distance and mean speed provided similarly accurate predictions (RMSE ± SD = 1.09 ± 0.05 au), suggesting that running distances and speeds are the strongest predictors of RPE in Australian football players. The ability of non-linear machine learning models to accurately predict athlete RPE has applications in live player monitoring and training load planning.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"15 1","pages":"64 - 77"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijcss-2016-0005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66993326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract AIM: The current investigation aimed to create an objective rating of Gaelic football teams and to examine factors relating to a team's rating. METHOD: A modified version of the Elo Ratings formula (Elo, 1978) was used to rate Gaelic football teams. A total of 1101 competitive senior Inter County matches from 2010-2015 were incorporated into calculations. Factors examined between teams included population, registered player numbers, previous success at adult and underage levels, financial income from the GAA, team expenses and number of clubs in a county. RESULTS: The Elo Ratings formula for Gaelic football was found to have a strong predictive ability, correctly predicting the result in 72.90% of 642 matches over a 6 year period. Strong positive correlations were observed between previous success at senior level, Under 21 level, Under 18 level and current Elo points. Moderate correlations exist between population figures and current Elo points. Moderate correlations are also evident between the number of registered players in a county and the county’s Elo rating points. CONCLUSION: Gaelic football teams can be objectively rated using a modified Elo Ratings formula. In order to develop a successful senior team, counties should focus on the development of underage players, particularly up to U18 and U21 level.
{"title":"A Rating System For Gaelic Football Teams: Factors That Influence Success","authors":"Shane Mangan, Kieran Collins","doi":"10.1515/ijcss-2016-0006","DOIUrl":"https://doi.org/10.1515/ijcss-2016-0006","url":null,"abstract":"Abstract AIM: The current investigation aimed to create an objective rating of Gaelic football teams and to examine factors relating to a team's rating. METHOD: A modified version of the Elo Ratings formula (Elo, 1978) was used to rate Gaelic football teams. A total of 1101 competitive senior Inter County matches from 2010-2015 were incorporated into calculations. Factors examined between teams included population, registered player numbers, previous success at adult and underage levels, financial income from the GAA, team expenses and number of clubs in a county. RESULTS: The Elo Ratings formula for Gaelic football was found to have a strong predictive ability, correctly predicting the result in 72.90% of 642 matches over a 6 year period. Strong positive correlations were observed between previous success at senior level, Under 21 level, Under 18 level and current Elo points. Moderate correlations exist between population figures and current Elo points. Moderate correlations are also evident between the number of registered players in a county and the county’s Elo rating points. CONCLUSION: Gaelic football teams can be objectively rated using a modified Elo Ratings formula. In order to develop a successful senior team, counties should focus on the development of underage players, particularly up to U18 and U21 level.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"29 1","pages":"78 - 90"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijcss-2016-0006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66993334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Recent studies concentrated on using and generating new statistics called sabermetrics in order to rank teams and players according to their perceived strengths and consequently applying these rankings to forecast specific games. In this paper, we employ sabermetrics statistics with the purpose of assessing the predictive capabilities of four data mining methods (classification and regression based) for predicting outcomes (win or loss) in MLB regular season games. Our model approach uses only past data when making a prediction, corresponding to ten years of publicly available data. We create a dataset with accumulative sabermetrics statistics for each MLB team during this period for which data contamination is not possible. The inherent difficulties of attempting this specific sports prediction are confirmed using two geometry or topology based measures of data complexity. Results reveal that the classification predictive scheme forecasts game outcomes better than regression scheme, and of the four data mining methods used, SVMs produce the best predictive results with a mean of nearly 60% prediction accuracy for each team. The evaluation of our model is performed using stratified 10-fold cross-validation.
{"title":"Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods","authors":"Soto Valero","doi":"10.1515/IJCSS-2016-0007","DOIUrl":"https://doi.org/10.1515/IJCSS-2016-0007","url":null,"abstract":"Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Recent studies concentrated on using and generating new statistics called sabermetrics in order to rank teams and players according to their perceived strengths and consequently applying these rankings to forecast specific games. In this paper, we employ sabermetrics statistics with the purpose of assessing the predictive capabilities of four data mining methods (classification and regression based) for predicting outcomes (win or loss) in MLB regular season games. Our model approach uses only past data when making a prediction, corresponding to ten years of publicly available data. We create a dataset with accumulative sabermetrics statistics for each MLB team during this period for which data contamination is not possible. The inherent difficulties of attempting this specific sports prediction are confirmed using two geometry or topology based measures of data complexity. Results reveal that the classification predictive scheme forecasts game outcomes better than regression scheme, and of the four data mining methods used, SVMs produce the best predictive results with a mean of nearly 60% prediction accuracy for each team. The evaluation of our model is performed using stratified 10-fold cross-validation.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"15 1","pages":"91-112"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/IJCSS-2016-0007","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66993383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The aim of this study was to identify the impact of different tactical behaviors on the winning probability in table tennis. The performance analysis was done by mathematical simulation using a Markov chain model. 259 high-level table tennis games were evaluated by means of a new simulation approach using numerical derivation to remove the necessity to perform a second modeling step in order to determine the difficulty of tactical behaviors. Based on the derivation, several mathematical constructs like directional derivations and the gradient are examined for application in table tennis. Results reveal errors and long rallies as the most influencing game situations, together with the positive effect of risky play on the winning probability of losing players.
{"title":"Performance Analysis in Table Tennis - Stochastic Simulation by Numerical Derivation","authors":"S. Wenninger, M. Lames","doi":"10.1515/ijcss-2016-0002","DOIUrl":"https://doi.org/10.1515/ijcss-2016-0002","url":null,"abstract":"Abstract The aim of this study was to identify the impact of different tactical behaviors on the winning probability in table tennis. The performance analysis was done by mathematical simulation using a Markov chain model. 259 high-level table tennis games were evaluated by means of a new simulation approach using numerical derivation to remove the necessity to perform a second modeling step in order to determine the difficulty of tactical behaviors. Based on the derivation, several mathematical constructs like directional derivations and the gradient are examined for application in table tennis. Results reveal errors and long rallies as the most influencing game situations, together with the positive effect of risky play on the winning probability of losing players.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":"15 1","pages":"22 - 36"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijcss-2016-0002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66993256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}