Abstract We analyse football (soccer) player performance data with mixed type variables from the 2014-15 season of eight European major leagues. We cluster these data based on a tailor-made dissimilarity measure. In order to decide between the many available clustering methods and to choose an appropriate number of clusters, we use the approach by Akhanli and Hennig (2020. “Comparing Clusterings and Numbers of Clusters by Aggregation of Calibrated Clustering Validity Indexes.” Statistics and Computing 30 (5): 1523–44). This is based on several validation criteria that refer to different desirable characteristics of a clustering. These characteristics are chosen based on the aim of clustering, and this allows to define a suitable validation index as weighted average of calibrated individual indexes measuring the desirable features. We derive two different clusterings. The first one is a partition of the data set into major groups of essentially different players, which can be used for the analysis of a team’s composition. The second one divides the data set into many small clusters (with 10 players on average), which can be used for finding players with a very similar profile to a given player. It is discussed in depth what characteristics are desirable for these clusterings. Weighting the criteria for the second clustering is informed by a survey of football experts.
{"title":"Clustering of football players based on performance data and aggregated clustering validity indexes","authors":"Serhat Emre Akhanli, C. Hennig","doi":"10.1515/jqas-2022-0037","DOIUrl":"https://doi.org/10.1515/jqas-2022-0037","url":null,"abstract":"Abstract We analyse football (soccer) player performance data with mixed type variables from the 2014-15 season of eight European major leagues. We cluster these data based on a tailor-made dissimilarity measure. In order to decide between the many available clustering methods and to choose an appropriate number of clusters, we use the approach by Akhanli and Hennig (2020. “Comparing Clusterings and Numbers of Clusters by Aggregation of Calibrated Clustering Validity Indexes.” Statistics and Computing 30 (5): 1523–44). This is based on several validation criteria that refer to different desirable characteristics of a clustering. These characteristics are chosen based on the aim of clustering, and this allows to define a suitable validation index as weighted average of calibrated individual indexes measuring the desirable features. We derive two different clusterings. The first one is a partition of the data set into major groups of essentially different players, which can be used for the analysis of a team’s composition. The second one divides the data set into many small clusters (with 10 players on average), which can be used for finding players with a very similar profile to a given player. It is discussed in depth what characteristics are desirable for these clusterings. Weighting the criteria for the second clustering is informed by a survey of football experts.","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"12 1","pages":"103 - 123"},"PeriodicalIF":0.8,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87941458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Sports analysis has gained paramount importance for coaches, scouts, and fans. Recently, computer vision researchers have taken on the challenge of collecting the necessary data by proposing several methods of automatic player and ball tracking. Building on the gathered tracking data, data miners are able to perform quantitative analysis on the performance of players and teams. With this survey, our goal is to provide a basic understanding for quantitative data analysts about the process of creating the input data and the characteristics thereof. Thus, we summarize the recent methods of optical tracking by providing a comprehensive taxonomy of conventional and deep learning methods, separately. Moreover, we discuss the preprocessing steps of tracking, the most common challenges in this domain, and the application of tracking data to sports teams. Finally, we compare the methods by their cost and limitations, and conclude the work by highlighting potential future research directions.
{"title":"Optical tracking in team sports","authors":"Pegah Rahimian, László Toka","doi":"10.1515/jqas-2020-0088","DOIUrl":"https://doi.org/10.1515/jqas-2020-0088","url":null,"abstract":"Abstract Sports analysis has gained paramount importance for coaches, scouts, and fans. Recently, computer vision researchers have taken on the challenge of collecting the necessary data by proposing several methods of automatic player and ball tracking. Building on the gathered tracking data, data miners are able to perform quantitative analysis on the performance of players and teams. With this survey, our goal is to provide a basic understanding for quantitative data analysts about the process of creating the input data and the characteristics thereof. Thus, we summarize the recent methods of optical tracking by providing a comprehensive taxonomy of conventional and deep learning methods, separately. Moreover, we discuss the preprocessing steps of tracking, the most common challenges in this domain, and the application of tracking data to sports teams. Finally, we compare the methods by their cost and limitations, and conclude the work by highlighting potential future research directions.","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"16 1","pages":"35 - 57"},"PeriodicalIF":0.8,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84821423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The Elo rating system contains a coefficient called the K-factor which governs the amount of change to the updated ratings and is often determined by empirical or heuristic means. Theoretical studies on the K-factor have been sparse and not much is known about the pertinent factors that impact its appropriate values in applications. This paper has two main goals: to present a new formulation of the K-factor that is optimal with respect to the mean-squared-error (MSE) criterion in a round-robin tournament setting and to investigate the effects of the relevant variables, including the number of tournament participants n, on the optimal K-factor (based on the model-averaged MSE). It is found that n and the variability of the deviation between the true rating and the pre-tournament rating have a strong influence on the optimal K-factor. Comparisons between the MSE-optimal K-factor and the K-factors from Elo and from the US Chess Federation as a function of n are also provided. Although the results are applicable to other sports in similar settings, the study focuses on chess and makes use of the rating data and the K-factor values from the chess world.
{"title":"MSE-optimal K-factor of the Elo rating system for round-robin tournament","authors":"Victor S. Chan","doi":"10.1515/jqas-2021-0079","DOIUrl":"https://doi.org/10.1515/jqas-2021-0079","url":null,"abstract":"Abstract The Elo rating system contains a coefficient called the K-factor which governs the amount of change to the updated ratings and is often determined by empirical or heuristic means. Theoretical studies on the K-factor have been sparse and not much is known about the pertinent factors that impact its appropriate values in applications. This paper has two main goals: to present a new formulation of the K-factor that is optimal with respect to the mean-squared-error (MSE) criterion in a round-robin tournament setting and to investigate the effects of the relevant variables, including the number of tournament participants n, on the optimal K-factor (based on the model-averaged MSE). It is found that n and the variability of the deviation between the true rating and the pre-tournament rating have a strong influence on the optimal K-factor. Comparisons between the MSE-optimal K-factor and the K-factors from Elo and from the US Chess Federation as a function of n are also provided. Although the results are applicable to other sports in similar settings, the study focuses on chess and makes use of the rating data and the K-factor values from the chess world.","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"53 1","pages":"59 - 72"},"PeriodicalIF":0.8,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76581251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Evaluation of individuals in a team sport setting is inherently difficult. The level of play of one individual is fundamentally tied to the level of play of the teammates. One way to think about evaluation of individuals is to ‘insert’ the posterior distribution of the parameter that measures individual play into an ‘average’ team, and see how the probability of success (or failure) changes. Using a Bayesian hierarchical logistic model, we can estimate both the average contribution to success of various positions, and the individual contribution of all the players in that position. In this paper, we use data from the 2018 World Championships in Volleyball to model both the position played and the players within each position. Using both the posterior distributions for the mean performance of the different positions, and the posterior distributions for the individual players, we can then estimate the change in the number of points scored for a team with a change from an average player to the individual under consideration. We compute both the points scored above average per set (PAAPS) and the points scored above average per 100 touches (PP100) for 168 men and 168 women playing five different positions. Contributions of the various position groups and of individual players within each position are evaluated and compared.
{"title":"Evaluating the performance of elite level volleyball players","authors":"G. Fellingham","doi":"10.1515/jqas-2021-0056","DOIUrl":"https://doi.org/10.1515/jqas-2021-0056","url":null,"abstract":"Abstract Evaluation of individuals in a team sport setting is inherently difficult. The level of play of one individual is fundamentally tied to the level of play of the teammates. One way to think about evaluation of individuals is to ‘insert’ the posterior distribution of the parameter that measures individual play into an ‘average’ team, and see how the probability of success (or failure) changes. Using a Bayesian hierarchical logistic model, we can estimate both the average contribution to success of various positions, and the individual contribution of all the players in that position. In this paper, we use data from the 2018 World Championships in Volleyball to model both the position played and the players within each position. Using both the posterior distributions for the mean performance of the different positions, and the posterior distributions for the individual players, we can then estimate the change in the number of points scored for a team with a change from an average player to the individual under consideration. We compute both the points scored above average per set (PAAPS) and the points scored above average per 100 touches (PP100) for 168 men and 168 women playing five different positions. Contributions of the various position groups and of individual players within each position are evaluated and compared.","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"1 1","pages":"15 - 34"},"PeriodicalIF":0.8,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89615955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-26DOI: 10.1515/jqas-2021-frontmatter4
{"title":"Frontmatter","authors":"","doi":"10.1515/jqas-2021-frontmatter4","DOIUrl":"https://doi.org/10.1515/jqas-2021-frontmatter4","url":null,"abstract":"","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"32 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90362442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The paper analyses how draw constraints influence the outcome of a knockout tournament. The research question is inspired by European club football competitions, where the organiser generally imposes an association constraint in the first round of the knockout phase: teams from the same country cannot be drawn against each other. Its effects are explored in both theoretical and simulation models. An association constraint in the first round(s) is found to increase the likelihood of same nation matchups to approximately the same extent in each subsequent round. If the favourite teams are concentrated in some associations, they will have a higher probability to win the tournament under this policy but the increase is less than linear if it is used in more rounds. Our results might explain the recent introduction of the association constraint for both the knockout round play-offs with 16 teams and the Round of 16 in the UEFA Europa League and UEFA Europa Conference League.
{"title":"The effects of draw restrictions on knockout tournaments","authors":"L'aszl'o Csat'o","doi":"10.1515/jqas-2022-0061","DOIUrl":"https://doi.org/10.1515/jqas-2022-0061","url":null,"abstract":"Abstract The paper analyses how draw constraints influence the outcome of a knockout tournament. The research question is inspired by European club football competitions, where the organiser generally imposes an association constraint in the first round of the knockout phase: teams from the same country cannot be drawn against each other. Its effects are explored in both theoretical and simulation models. An association constraint in the first round(s) is found to increase the likelihood of same nation matchups to approximately the same extent in each subsequent round. If the favourite teams are concentrated in some associations, they will have a higher probability to win the tournament under this policy but the increase is less than linear if it is used in more rounds. Our results might explain the recent introduction of the association constraint for both the knockout round play-offs with 16 teams and the Round of 16 in the UEFA Europa League and UEFA Europa Conference League.","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"22 1","pages":"227 - 239"},"PeriodicalIF":0.8,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90863890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We design, describe and implement a statistical engine to analyze the performance of gymnastics judges with three objectives: (1) provide constructive feedback to judges, executive committees and national federations; (2) assign the best judges to the most important competitions; (3) detect bias and persistent misjudging. Judging a gymnastics routine is a random process, and we model this process using heteroscedastic random variables. The developed marking score scales the difference between the mark of a judge and the true performance level of a gymnast as a function of the intrinsic judging error variability estimated from historical data for each apparatus. This dependence between judging variability and performance quality has never been properly studied. We leverage the intrinsic judging error variability and the marking score to detect outlier marks and study the national bias of judges favoring athletes of the same nationality. We also study ranking scores assessing to what extent judges rate gymnasts in the correct order. Our main observation is that there are significant differences between the best and worst judges, both in terms of accuracy and national bias. The insights from this work have led to recommendations and rule changes at the Fédération Internationale de Gymnastique.
{"title":"Judging the judges: evaluating the accuracy and national bias of international gymnastics judges","authors":"Sandro Heiniger, Hugues Mercier","doi":"10.1515/jqas-2019-0113","DOIUrl":"https://doi.org/10.1515/jqas-2019-0113","url":null,"abstract":"Abstract We design, describe and implement a statistical engine to analyze the performance of gymnastics judges with three objectives: (1) provide constructive feedback to judges, executive committees and national federations; (2) assign the best judges to the most important competitions; (3) detect bias and persistent misjudging. Judging a gymnastics routine is a random process, and we model this process using heteroscedastic random variables. The developed marking score scales the difference between the mark of a judge and the true performance level of a gymnast as a function of the intrinsic judging error variability estimated from historical data for each apparatus. This dependence between judging variability and performance quality has never been properly studied. We leverage the intrinsic judging error variability and the marking score to detect outlier marks and study the national bias of judges favoring athletes of the same nationality. We also study ranking scores assessing to what extent judges rate gymnasts in the correct order. Our main observation is that there are significant differences between the best and worst judges, both in terms of accuracy and national bias. The insights from this work have led to recommendations and rule changes at the Fédération Internationale de Gymnastique.","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"6 1","pages":"289 - 305"},"PeriodicalIF":0.8,"publicationDate":"2021-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74966788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timothy C. Y. Chan, Douglas Fearing, Craig Fernandes, S. Kovalchik
Abstract Value functions are used in sports to determine the optimal action players should employ. However, most literature implicitly assumes that players can perform the prescribed action with known and fixed probability of success. The effect of varying this probability or, equivalently, “execution error” in implementing an action (e.g., hitting a tennis ball to a specific location on the court) on the design of optimal strategies, has received limited attention. In this paper, we develop a novel modeling framework based on Markov reward processes and Markov decision processes to investigate how execution error impacts a player’s value function and strategy in tennis. We power our models with hundreds of millions of simulated tennis shots with 3D ball and 2D player tracking data. We find that optimal shot selection strategies in tennis become more conservative as execution error grows, and that having perfect execution with the empirical shot selection strategy is roughly equivalent to choosing one or two optimal shots with average execution error. We find that execution error on backhand shots is more costly than on forehand shots, and that optimal shot selection on a serve return is more valuable than on any other shot, over all values of execution error.
{"title":"A Markov process approach to untangling intention versus execution in tennis","authors":"Timothy C. Y. Chan, Douglas Fearing, Craig Fernandes, S. Kovalchik","doi":"10.1515/jqas-2021-0077","DOIUrl":"https://doi.org/10.1515/jqas-2021-0077","url":null,"abstract":"Abstract Value functions are used in sports to determine the optimal action players should employ. However, most literature implicitly assumes that players can perform the prescribed action with known and fixed probability of success. The effect of varying this probability or, equivalently, “execution error” in implementing an action (e.g., hitting a tennis ball to a specific location on the court) on the design of optimal strategies, has received limited attention. In this paper, we develop a novel modeling framework based on Markov reward processes and Markov decision processes to investigate how execution error impacts a player’s value function and strategy in tennis. We power our models with hundreds of millions of simulated tennis shots with 3D ball and 2D player tracking data. We find that optimal shot selection strategies in tennis become more conservative as execution error grows, and that having perfect execution with the empirical shot selection strategy is roughly equivalent to choosing one or two optimal shots with average execution error. We find that execution error on backhand shots is more costly than on forehand shots, and that optimal shot selection on a serve return is more valuable than on any other shot, over all values of execution error.","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"7 1","pages":"127 - 145"},"PeriodicalIF":0.8,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89110515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Calculating the value of football player’s on-field performance has been limited to scouting methods while data-driven methods are mostly limited to quarterbacks. A popular method to calculate player value in other sports are Adjusted Plus–Minus (APM) and Regularized Adjusted Plus–Minus (RAPM) models. These models have been used in other sports, most notably basketball (Rosenbaum, D. T. 2004. Measuring How NBA Players Help Their Teams Win. http://www.82games.com/comm30.htm#_ftn1; Kubatko, J., D. Oliver, K. Pelton, and D. T. Rosenbaum. 2007. “A Starting Point for Analyzing Basketball Statistics.” Journal of Quantitative Analysis in Sports 3 (3); Winston, W. 2009. Player and Lineup Analysis in the NBA. Cambridge, Massachusetts; Sill, J. 2010. “Improved NBA Adjusted +/− Using Regularization and Out-Of-Sample Testing.” In Proceedings of the 2010 MIT Sloan Sports Analytics Conference) to estimate each player’s value by accounting for those in the game at the same time. Football is less amenable to APM models due to its few scoring events, few lineup changes, restrictive positioning, and small quantity of games relative to the number of teams. More recent methods have found ways to incorporate plus–minus models in other sports such as Hockey (Macdonald, B. 2011. “A Regression-Based Adjusted Plus-Minus Statistic for NHL players.” Journal of Quantitative Analysis in Sports 7 (3)) and Soccer (Schultze, S. R., and C.-M. Wellbrock. 2018. “A Weighted Plus/Minus Metric for Individual Soccer Player Performance.” Journal of Sports Analytics 4 (2): 121–31 and Matano, F., L. F. Richardson, T. Pospisil, C. Eubanks, and J. Qin (2018). Augmenting Adjusted Plus-Minus in Soccer with Fifa Ratings. arXiv preprint arXiv:1810.08032). These models are useful in coming up with results-oriented estimation of each player’s value. In American football, many positions such as offensive lineman have no recorded statistics which hinders the ability to estimate a player’s value. I provide a fully hierarchical Bayesian plus–minus (HBPM) model framework that extends RAPM to include position-specific penalization that solves many of the shortcomings of APM and RAPM models in American football. Cross-validated results show the HBPM to be more predictive out of sample than RAPM or APM models. Results for the HBPM models are provided for both Collegiate and NFL football players as well as deeper insights into positional value and position-specific age curves.
摘要足球运动员场上表现价值的计算一直局限于球探方法,而数据驱动方法大多局限于四分卫。在其他运动中,计算球员价值的常用方法是调整正负(APM)和正则化调整正负(RAPM)模型。这些模型也被用于其他运动,最著名的是篮球(Rosenbaum, D. T. 2004)。衡量NBA球员如何帮助他们的球队获胜。http://www.82games.com/comm30.htm _ftn1;J. Kubatko, D. Oliver, K. Pelton和D. T. Rosenbaum. 2007。《篮球统计分析的起点》体育定量分析杂志3 (3);温斯顿,W. 2009。NBA中的球员和阵容分析。马萨诸塞州剑桥;刘志强,2010。“改进NBA调整+/−使用正则化和样本外测试。”在2010年麻省理工学院斯隆体育分析会议的论文集中),通过计算同时参加比赛的球员来估计每个球员的价值。足球由于得分事件少,阵容变化少,定位受限,比赛数量相对于球队数量较少,因此不太适合APM模型。最近的方法已经找到了将正负模型纳入其他运动(如曲棍球)的方法(Macdonald, B. 2011)。“基于回归的NHL球员调整正负统计。”体育定量分析杂志7(3))和足球(舒尔茨,S. R.和c.m。Wellbrock》2018。“足球运动员个人表现的加权正负指标。”体育分析杂志4(2):121-31和Matano, F., L. F. Richardson, T. Pospisil, C. Eubanks, J. Qin(2018)。扩大调整正负在足球与国际足联评级。arXiv:1810.08032)。这些模型有助于以结果为导向估算每个玩家的价值。在美式足球中,许多位置,如进攻线卫,没有记录的数据,这阻碍了人们对球员价值的估计。我提供了一个完全分层的贝叶斯加减(HBPM)模型框架,它将RAPM扩展到包括特定位置的惩罚,从而解决了美式橄榄球中APM和RAPM模型的许多缺点。交叉验证的结果表明,HBPM在样本外比RAPM或APM模型更具预测性。HBPM模型的结果提供了大学和NFL橄榄球运动员以及更深入的位置价值和位置特定年龄曲线的见解。
{"title":"Estimating player value in American football using plus–minus models","authors":"R. Sabin","doi":"10.1515/jqas-2020-0033","DOIUrl":"https://doi.org/10.1515/jqas-2020-0033","url":null,"abstract":"Abstract Calculating the value of football player’s on-field performance has been limited to scouting methods while data-driven methods are mostly limited to quarterbacks. A popular method to calculate player value in other sports are Adjusted Plus–Minus (APM) and Regularized Adjusted Plus–Minus (RAPM) models. These models have been used in other sports, most notably basketball (Rosenbaum, D. T. 2004. Measuring How NBA Players Help Their Teams Win. http://www.82games.com/comm30.htm#_ftn1; Kubatko, J., D. Oliver, K. Pelton, and D. T. Rosenbaum. 2007. “A Starting Point for Analyzing Basketball Statistics.” Journal of Quantitative Analysis in Sports 3 (3); Winston, W. 2009. Player and Lineup Analysis in the NBA. Cambridge, Massachusetts; Sill, J. 2010. “Improved NBA Adjusted +/− Using Regularization and Out-Of-Sample Testing.” In Proceedings of the 2010 MIT Sloan Sports Analytics Conference) to estimate each player’s value by accounting for those in the game at the same time. Football is less amenable to APM models due to its few scoring events, few lineup changes, restrictive positioning, and small quantity of games relative to the number of teams. More recent methods have found ways to incorporate plus–minus models in other sports such as Hockey (Macdonald, B. 2011. “A Regression-Based Adjusted Plus-Minus Statistic for NHL players.” Journal of Quantitative Analysis in Sports 7 (3)) and Soccer (Schultze, S. R., and C.-M. Wellbrock. 2018. “A Weighted Plus/Minus Metric for Individual Soccer Player Performance.” Journal of Sports Analytics 4 (2): 121–31 and Matano, F., L. F. Richardson, T. Pospisil, C. Eubanks, and J. Qin (2018). Augmenting Adjusted Plus-Minus in Soccer with Fifa Ratings. arXiv preprint arXiv:1810.08032). These models are useful in coming up with results-oriented estimation of each player’s value. In American football, many positions such as offensive lineman have no recorded statistics which hinders the ability to estimate a player’s value. I provide a fully hierarchical Bayesian plus–minus (HBPM) model framework that extends RAPM to include position-specific penalization that solves many of the shortcomings of APM and RAPM models in American football. Cross-validated results show the HBPM to be more predictive out of sample than RAPM or APM models. Results for the HBPM models are provided for both Collegiate and NFL football players as well as deeper insights into positional value and position-specific age curves.","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"8 1","pages":"313 - 364"},"PeriodicalIF":0.8,"publicationDate":"2021-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84747452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Rowing needs a standardized Gold Medal Standard (GMS) to clearly compare performance across boat classes in competition. Here, we report a method to factor out environmental effects, developing a fairer GMS for individual rowing events. We used results from World Rowing Championships and Olympics Games (2005–2016) to calculate the difference between the fastest winning time of the day and other event winning times on the same day. From this, we calculated a prognostic GMS time for each event via repeated k-fold cross-validation linear regression. Then, we compared these values with the 10-year average winning time and the World Best Time (WBT). We repeated this process to develop prognostic podium standard (PS) times. The prognostic GMS times (RMSE = 9.47; R 2 = 0.875) were universally slower than the WBT (current GMS) by 6.2 s on average but faster than the 10-year average by 12.3 s. The prognostic PS times (RMSE = 10.5; R 2 = 897) were also slower than the WBT but faster than the 10-year average, by 12.2 and 6.3 s respectively. Our time-difference prediction model based on historical data generates non-outlier prognostic times. With the utilization of relative time difference, this approach promises a selection standard independent of environmental conditions, easily applicable across different sports.
{"title":"Towards a more objective time standard in competitive rowing","authors":"Kenneth M. Kimmins, M. Tsai","doi":"10.1515/jqas-2020-0055","DOIUrl":"https://doi.org/10.1515/jqas-2020-0055","url":null,"abstract":"Abstract Rowing needs a standardized Gold Medal Standard (GMS) to clearly compare performance across boat classes in competition. Here, we report a method to factor out environmental effects, developing a fairer GMS for individual rowing events. We used results from World Rowing Championships and Olympics Games (2005–2016) to calculate the difference between the fastest winning time of the day and other event winning times on the same day. From this, we calculated a prognostic GMS time for each event via repeated k-fold cross-validation linear regression. Then, we compared these values with the 10-year average winning time and the World Best Time (WBT). We repeated this process to develop prognostic podium standard (PS) times. The prognostic GMS times (RMSE = 9.47; R 2 = 0.875) were universally slower than the WBT (current GMS) by 6.2 s on average but faster than the 10-year average by 12.3 s. The prognostic PS times (RMSE = 10.5; R 2 = 897) were also slower than the WBT but faster than the 10-year average, by 12.2 and 6.3 s respectively. Our time-difference prediction model based on historical data generates non-outlier prognostic times. With the utilization of relative time difference, this approach promises a selection standard independent of environmental conditions, easily applicable across different sports.","PeriodicalId":16925,"journal":{"name":"Journal of Quantitative Analysis in Sports","volume":"1 1","pages":"307 - 311"},"PeriodicalIF":0.8,"publicationDate":"2021-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89791568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}