The Pythagorean Win-Loss formula can be effectively used to estimate winning percentages for sporting events. This formula was initially developed by baseball statistician Bill James and later was extended by other researchers to sports such as football, basketball, and ice hockey. Although one can calculate actual winning percentages based on the outcomes of played games, that approach does not take into account the margin of victory. The key benefit of the Pythagorean formula is its utilization of actual average runs scored and actual average runs allowed. This article presents the application of the Pythagorean Win-Loss formula to two different types of limited-overs cricket formats, namely One Day International cricket (ODI) and Twenty20 cricket. The data for the application was used from the matches played by the top 10 International Cricket Council (ICC) members who participated in the 2019 ICC Cricket World Cup. For matches for which the second batting team won, runs scored were estimated by considering the remaining amount of resources, based on the Duckworth–Lewis method.
{"title":"Predicting the winning percentage of limited-overs cricket using the pythagorean formula","authors":"H. K. Senevirathne, Ananda B. W. Manage","doi":"10.3233/JSA-200480","DOIUrl":"https://doi.org/10.3233/JSA-200480","url":null,"abstract":"The Pythagorean Win-Loss formula can be effectively used to estimate winning percentages for sporting events. This formula was initially developed by baseball statistician Bill James and later was extended by other researchers to sports such as football, basketball, and ice hockey. Although one can calculate actual winning percentages based on the outcomes of played games, that approach does not take into account the margin of victory. The key benefit of the Pythagorean formula is its utilization of actual average runs scored and actual average runs allowed. This article presents the application of the Pythagorean Win-Loss formula to two different types of limited-overs cricket formats, namely One Day International cricket (ODI) and Twenty20 cricket. The data for the application was used from the matches played by the top 10 International Cricket Council (ICC) members who participated in the 2019 ICC Cricket World Cup. For matches for which the second batting team won, runs scored were estimated by considering the remaining amount of resources, based on the Duckworth–Lewis method.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2021-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/JSA-200480","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42867726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we analyze the factors impacting on the length of a men’s professional tennis match and propose a model to simulate matches’ durations. Two distinctive features of the model are that i) it considers all kinds of events that impact on the duration of a match and ii) it is based only on publicly available data. Once built, the model allows to analyze the impact of different formats or rule changes on matches’ duration. The model is built and validated using a dataset including 19,961 matches played in the period January 2011 – December 2018. The simulated and observed distributions of the durations are compared with an in-depth goodness-of-fit analysis. This points out that the model provides a good description of the real distribution both in the central part and in the tails. We also show that our model improves similar models present in the literature. Finally, several case studies are analyzed: the effect of abolishing the first service or the advantages or both; the new tie-break format at Wimbledon; and the introduction of fifth set tie-break at Roland Garros.
{"title":"Modeling and simulating durations of men’s professional tennis matches by resampling match features","authors":"Francesco Lisi, Matteo Grigoletto","doi":"10.3233/JSA-200455","DOIUrl":"https://doi.org/10.3233/JSA-200455","url":null,"abstract":"In this paper we analyze the factors impacting on the length of a men’s professional tennis match and propose a model to simulate matches’ durations. Two distinctive features of the model are that i) it considers all kinds of events that impact on the duration of a match and ii) it is based only on publicly available data. Once built, the model allows to analyze the impact of different formats or rule changes on matches’ duration. The model is built and validated using a dataset including 19,961 matches played in the period January 2011 – December 2018. The simulated and observed distributions of the durations are compared with an in-depth goodness-of-fit analysis. This points out that the model provides a good description of the real distribution both in the central part and in the tails. We also show that our model improves similar models present in the literature. Finally, several case studies are analyzed: the effect of abolishing the first service or the advantages or both; the new tie-break format at Wimbledon; and the introduction of fifth set tie-break at Roland Garros.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":"1 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2021-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/JSA-200455","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42455000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In sports, including Test cricket, athletes from years past serve as performance role models and set benchmarks for subsequent generations of players. Sports fans often wonder: are players of today as good as greats from the past? Alternatively, how do today’s athletes compare with greats from yesteryears? This paper attempts to answer that question for Test match cricket. We applied data mining to batting performance of eighty, now retired, Test Cricket Greats (TCG from hereon) from eight major Test cricket countries. Batting performance attributes included batting average, strike rate, numbers of fifties and hundreds scored, among others. Using k-Means cluster analysis, TCG performance records were classified into three clusters which was our Training Model. Two clusters were populated by established batsmen and the third cluster included bowlers, all-rounders with significant bowling, and some batsmen. The Learning Model was applied to predict classifications of thirty two Test Cricket Active (TCA from hereon) players. Statistical tests were performed, cluster wise, to highlight similarities and dis-similarities between TCA and TCG players. Results show that several active players, while still mid-career, have already achieved batting performance records which are at par with the best of TCG.
{"title":"Are today’s Test cricket batsmen better than the greats of yesteryears? A comparative analysis","authors":"Anil Gulati, C. Mutigwe","doi":"10.3233/jsa-200503","DOIUrl":"https://doi.org/10.3233/jsa-200503","url":null,"abstract":"In sports, including Test cricket, athletes from years past serve as performance role models and set benchmarks for subsequent generations of players. Sports fans often wonder: are players of today as good as greats from the past? Alternatively, how do today’s athletes compare with greats from yesteryears? This paper attempts to answer that question for Test match cricket. We applied data mining to batting performance of eighty, now retired, Test Cricket Greats (TCG from hereon) from eight major Test cricket countries. Batting performance attributes included batting average, strike rate, numbers of fifties and hundreds scored, among others. Using k-Means cluster analysis, TCG performance records were classified into three clusters which was our Training Model. Two clusters were populated by established batsmen and the third cluster included bowlers, all-rounders with significant bowling, and some batsmen. The Learning Model was applied to predict classifications of thirty two Test Cricket Active (TCA from hereon) players. Statistical tests were performed, cluster wise, to highlight similarities and dis-similarities between TCA and TCG players. Results show that several active players, while still mid-career, have already achieved batting performance records which are at par with the best of TCG.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":"1 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/jsa-200503","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70125755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The golf director problem is a sports management problem that aims to find an allocation of golf players into fair teams for certain golf club competitions. The motivation for fairness as the objective is that club golf competitions are recreational events for which the golf director needs to form teams that are competitive even though they consist of players with different skill levels measured by their USGA (http://www.usga.org) or R&A (http://www.randa.org) handicaps. We formalize the concept of “fairness" of allocation of players into teams playing 18-hole golf games and argue that finding an optimal assignment of players to teams is intractable for even the fastest computers. Instead, we provide an efficient simulation and optimization-based procedure that finds a near-optimal fair team allocation. Computational tests show the approach to be better than standard methods. A computer implementation of the solution method is publicly available and located at http://www.fairgolfteams.com. The website provides a golf director with a variety of controls to manage and run club golf competitions in a fair way. This is described in the appendix.
{"title":"Algorithms and Software for the Golf Director Problem","authors":"G. Benincasa, Konstantin Pavlikov, D. Hearn","doi":"10.3233/jsa-200346","DOIUrl":"https://doi.org/10.3233/jsa-200346","url":null,"abstract":"The golf director problem is a sports management problem that aims to find an allocation of golf players into fair teams for certain golf club competitions. The motivation for fairness as the objective is that club golf competitions are recreational events for which the golf director needs to form teams that are competitive even though they consist of players with different skill levels measured by their USGA (http://www.usga.org) or R&A (http://www.randa.org) handicaps. We formalize the concept of “fairness\" of allocation of players into teams playing 18-hole golf games and argue that finding an optimal assignment of players to teams is intractable for even the fastest computers. Instead, we provide an efficient simulation and optimization-based procedure that finds a near-optimal fair team allocation. Computational tests show the approach to be better than standard methods. A computer implementation of the solution method is publicly available and located at http://www.fairgolfteams.com. The website provides a golf director with a variety of controls to manage and run club golf competitions in a fair way. This is described in the appendix.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2020-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/jsa-200346","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44938051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ethan M Stewart, Megan Stewart, J. Simpson, A. Knight, H. Chander, R. Shapiro
In order to successfully hit a baseball, hitters must utilize a series of preparatory movements (swing phases) which include shifting their body weight, stepping, landing, and swinging. The purpose of this study was to examine the differences between start times for swing phases (shifting, stepping, landing, and swinging)for currently active baseball players. Participants (n = 12) were all current collegiate baseball athletes. Retroreflective markers, surface electromyography (EMG) and two force platforms were utilized to complete a swing analysis. Each participant completed five swinging trials off a tee. All dependent variables were compared using a repeated measures 1×4 ANOVA with LSD post hoc comparison (p < 0.05) if necessary. The results demonstrated that the participants started the swing phases in a statistically significant sequence of shifting, stepping, landing, and swinging. The ability of the athletes to start the swing phases in this sequential order may be advantageous to regulate spatial parameters of their swing and provide more time to generate power. These results allow for coaches to better understand how to instruct their athletes to be successful at the plate.
{"title":"Sequential order of swing phase initiation in baseball","authors":"Ethan M Stewart, Megan Stewart, J. Simpson, A. Knight, H. Chander, R. Shapiro","doi":"10.3233/jsa-200394","DOIUrl":"https://doi.org/10.3233/jsa-200394","url":null,"abstract":"In order to successfully hit a baseball, hitters must utilize a series of preparatory movements (swing phases) which include shifting their body weight, stepping, landing, and swinging. The purpose of this study was to examine the differences between start times for swing phases (shifting, stepping, landing, and swinging)for currently active baseball players. Participants (n = 12) were all current collegiate baseball athletes. Retroreflective markers, surface electromyography (EMG) and two force platforms were utilized to complete a swing analysis. Each participant completed five swinging trials off a tee. All dependent variables were compared using a repeated measures 1×4 ANOVA with LSD post hoc comparison (p < 0.05) if necessary. The results demonstrated that the participants started the swing phases in a statistically significant sequence of shifting, stepping, landing, and swinging. The ability of the athletes to start the swing phases in this sequential order may be advantageous to regulate spatial parameters of their swing and provide more time to generate power. These results allow for coaches to better understand how to instruct their athletes to be successful at the plate.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2020-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/jsa-200394","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42191564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address the question of how to quantify the contributions of groups of players to team success. Our approach is based on spectral analysis, a technique from algebraic signal processing, which has several appealing features. First, our analysis decomposes the team success signal into components that are naturally understood as the contributions of player groups of a given size: individuals, pairs, triples, fours, and full five-player lineups. Secondly, the decomposition is orthogonal so that contributions of a player group can be thought of as pure: Contributions attributed to a group of three, for example, have been separated from the lower-order contributions of constituent pairs and individuals. We present detailed a spectral analysis using NBA play-by-play data and show how this can be a practical tool in understanding lineup composition and utilization.
{"title":"Identifying group contributions in NBA lineups with spectral analysis","authors":"Stephen Devlin, D. Uminsky","doi":"10.3233/jsa-200407","DOIUrl":"https://doi.org/10.3233/jsa-200407","url":null,"abstract":"We address the question of how to quantify the contributions of groups of players to team success. Our approach is based on spectral analysis, a technique from algebraic signal processing, which has several appealing features. First, our analysis decomposes the team success signal into components that are naturally understood as the contributions of player groups of a given size: individuals, pairs, triples, fours, and full five-player lineups. Secondly, the decomposition is orthogonal so that contributions of a player group can be thought of as pure: Contributions attributed to a group of three, for example, have been separated from the lower-order contributions of constituent pairs and individuals. We present detailed a spectral analysis using NBA play-by-play data and show how this can be a practical tool in understanding lineup composition and utilization.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2020-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/jsa-200407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45757331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents findings of a study to predict the winners of an One Day International (ODI) cricket game, after the completion of the first inning of the game. We use Naive Bayes (NB) approach to make this prediction using the data collected with 15 features, comprised of variables related to batting, bowling, team composition, and other. Upon the construction of an initial model, our objective is to improve the accuracy of predicting the winner using some feature selection algorithms, namely univariate, recursive elimination, and principle component analysis (PCA). Furthermore, we examine the contribution of the appropriate ratios of training sample size to testing sample size on the accuracy of prediction. According to the experimental findings, the accuracy of winner-prediction can be improved with the use of feature selection algorithm. Moreover, the accuracy of winner prediction becomes the highest (85.71%) with the univariate feature selection method, compared to its counterparts. By selecting the appropriate ratio of the sample sizes of training sample to testing sample, the prediction accuracy can be further increased.
{"title":"Naive Bayes approach to predict the winner of an ODI cricket game","authors":"I. Wickramasinghe","doi":"10.3233/jsa-200436","DOIUrl":"https://doi.org/10.3233/jsa-200436","url":null,"abstract":"This paper presents findings of a study to predict the winners of an One Day International (ODI) cricket game, after the completion of the first inning of the game. We use Naive Bayes (NB) approach to make this prediction using the data collected with 15 features, comprised of variables related to batting, bowling, team composition, and other. Upon the construction of an initial model, our objective is to improve the accuracy of predicting the winner using some feature selection algorithms, namely univariate, recursive elimination, and principle component analysis (PCA). Furthermore, we examine the contribution of the appropriate ratios of training sample size to testing sample size on the accuracy of prediction. According to the experimental findings, the accuracy of winner-prediction can be improved with the use of feature selection algorithm. Moreover, the accuracy of winner prediction becomes the highest (85.71%) with the univariate feature selection method, compared to its counterparts. By selecting the appropriate ratio of the sample sizes of training sample to testing sample, the prediction accuracy can be further increased.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":"1 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/jsa-200436","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41549017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
. Taking advantage of space and time is a major focus of tennis coaching yet few statistical measures exist to evaluate a player’s spatio-temporal performance in matches. The present study proposed the time to net as a single metric capturing both space and time characteristics of the quality of a shot. Tracking data from 2017 Australian Open allowed a detailed investigation of the characteristics and predictive value of the time-to-net in 33,913 men’s and 19,195 women’s shots. For groundstroke shots, the majority of men’s and women’s shots have a time-to-net between 200 and 800ms. The expected time to net was found to vary significantly by gender, shot type, and where in a rally it occurred. We found considerable between-player differences in average time-to-net of groundstrokes when serving or receiving, indicating the potential for time-to-net to capture differences in playing style. Time-to-net increased prediction accuracy of point outcomes by 8 percentage points. These findings show that time to net is a simple spatio-temporal statistic that has descriptive and predictive value for performance analysis in tennis.
{"title":"Analysing time pressure in professional tennis","authors":"Miha Mlakar, S. Kovalchik","doi":"10.3233/jsa-200406","DOIUrl":"https://doi.org/10.3233/jsa-200406","url":null,"abstract":". Taking advantage of space and time is a major focus of tennis coaching yet few statistical measures exist to evaluate a player’s spatio-temporal performance in matches. The present study proposed the time to net as a single metric capturing both space and time characteristics of the quality of a shot. Tracking data from 2017 Australian Open allowed a detailed investigation of the characteristics and predictive value of the time-to-net in 33,913 men’s and 19,195 women’s shots. For groundstroke shots, the majority of men’s and women’s shots have a time-to-net between 200 and 800ms. The expected time to net was found to vary significantly by gender, shot type, and where in a rally it occurred. We found considerable between-player differences in average time-to-net of groundstrokes when serving or receiving, indicating the potential for time-to-net to capture differences in playing style. Time-to-net increased prediction accuracy of point outcomes by 8 percentage points. These findings show that time to net is a simple spatio-temporal statistic that has descriptive and predictive value for performance analysis in tennis.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/jsa-200406","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44031708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing starting position bias in the Speedway Grand Prix","authors":"C. A. Williamson","doi":"10.3233/jsa-200453","DOIUrl":"https://doi.org/10.3233/jsa-200453","url":null,"abstract":"","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/jsa-200453","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42623352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the relationship between +/– ratings and event-level performance statistics","authors":"G. Gelade, L. M. Hvattum","doi":"10.3233/jsa-200432","DOIUrl":"https://doi.org/10.3233/jsa-200432","url":null,"abstract":"","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3233/jsa-200432","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44680283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}