Lucio Palazzo, Roberto Rondinelli, Filipe Manuel Clemente, Riccardo Ievoli, Giancarlo Ragozini
The men’s football transfer market represents a complex phenomenon requiring suitable methods for an in-depth study. Network Analysis may be employed to measure the key elements of the transfer market through network indicators, such as degree centrality, hub and authority scores, and betweenness centrality. Furthermore, community detection methods can be proposed to unveil unobservable patterns of the football market, even considering auxiliary variables such as the type of transfer, the age or the role of the player, and the agents involved in the transfer flow. These methodologies are applied to the flows of player transfers generated by the 20 teams of the Italian first division (Serie A). These flows include teams from all over the world. We consider the summer market session of 2019, at the beginning of the season 2019-2020. Results also help to better understand some peculiarities of the Italian football transfer market in terms of the different approaches of the elite teams. Network indices show the presence of different market strategies, highlighting the role of mid-level teams such as Atalanta, Genoa, and Sassuolo. The network reveals a core-periphery structure splitted into several communities. The Infomap algorithm identifies 14 single team-based communities and three communities formed by two teams. Two of the latter are composed of a top team and a mid-level team, suggesting the presence of collaboration and similar market behavior, while the third is guided by two teams promoted by the second division (Serie B).
{"title":"Community structure of the football transfer market network: the case of Italian Serie A","authors":"Lucio Palazzo, Roberto Rondinelli, Filipe Manuel Clemente, Riccardo Ievoli, Giancarlo Ragozini","doi":"10.3233/jsa-220661","DOIUrl":"https://doi.org/10.3233/jsa-220661","url":null,"abstract":"The men’s football transfer market represents a complex phenomenon requiring suitable methods for an in-depth study. Network Analysis may be employed to measure the key elements of the transfer market through network indicators, such as degree centrality, hub and authority scores, and betweenness centrality. Furthermore, community detection methods can be proposed to unveil unobservable patterns of the football market, even considering auxiliary variables such as the type of transfer, the age or the role of the player, and the agents involved in the transfer flow. These methodologies are applied to the flows of player transfers generated by the 20 teams of the Italian first division (Serie A). These flows include teams from all over the world. We consider the summer market session of 2019, at the beginning of the season 2019-2020. Results also help to better understand some peculiarities of the Italian football transfer market in terms of the different approaches of the elite teams. Network indices show the presence of different market strategies, highlighting the role of mid-level teams such as Atalanta, Genoa, and Sassuolo. The network reveals a core-periphery structure splitted into several communities. The Infomap algorithm identifies 14 single team-based communities and three communities formed by two teams. Two of the latter are composed of a top team and a mid-level team, suggesting the presence of collaboration and similar market behavior, while the third is guided by two teams promoted by the second division (Serie B).","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135191392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Indian Premier League (IPL) is the most popular T20 domestic sporting league globally. Player selection is crucial in winning the competitive IPL tournament. Thus, team management select 11 players for each match from a team’s squad of 15 to 25 players. Different player statistics are analysed to select the best playing 11 for each match. This study attempts an approach where the on-field player performance is used to determine the playing-11. A player’s on-field performance in a match is computed as a single metric considering a player’s attributes against every player present in the opposition squad. For this computation, past ball-by-ball data is cleaned and mined to generate data containing player-vs-player performance attributes. Next, the various performance attributes for a player-vs-player combination is converted into a player’s performance rating by computing a weighted score of the performance attributes. Finally, an optimisation model is proposed and developed to determine the best playing-11 using the computed performance ratings. The developed optimisation model suggests the playing-11 that maximises the possibility of winning against a given opponent. The proposed procedure to determine the playing-11 for an IPL match is demonstrated using past data from 2008-20. The demonstration indicates that for matches in the league stage, the suggested playing-11 by model and the actual playing-11 have a ∼7% similarity across all teams. The remaining ∼3% are different from those selected in the actual team. Nevertheless, this difference approximately yields a ∼ Indian Premier League (IPL) is the most popular T20 domestic sporting league globally. Player selection is crucial in winning the competitive IPL tournament. Thus, team management select 11 players for each match from a team’s squad of 15 to 25 players. Different player statistics are analysed to select the best playing 11 for each match. This study attempts an approach where the on-field player performance is used to determine the playing-11. A player’s on-field performance in a match is computed as a single metric considering a player’s attributes against every player present in the opposition squad. For this computation, past ball-by-ball data is cleaned and mined to generate data containing player-vs-player performance attributes. Next, the various performance attributes for a player-vs-player combination is converted into a player’s performance rating by computing a weighted score of the performance attributes. Finally, an optimisation model is proposed and developed to determine the best playing-11 using the computed performance ratings. The developed optimisation model suggests the playing-11 that maximises the possibility of winning against a given opponent. The proposed procedure to determine the playing-11 for an IPL match is demonstrated using past data from 2008-20. The demonstration indicates that for matches in the league stage, the suggested playing-11 by model and the a
{"title":"Determining the playing 11 based on opposition squad: An IPL illustration","authors":"G. Gokul, Malolan Sundararaman","doi":"10.3233/jsa-220638","DOIUrl":"https://doi.org/10.3233/jsa-220638","url":null,"abstract":"Indian Premier League (IPL) is the most popular T20 domestic sporting league globally. Player selection is crucial in winning the competitive IPL tournament. Thus, team management select 11 players for each match from a team’s squad of 15 to 25 players. Different player statistics are analysed to select the best playing 11 for each match. This study attempts an approach where the on-field player performance is used to determine the playing-11. A player’s on-field performance in a match is computed as a single metric considering a player’s attributes against every player present in the opposition squad. For this computation, past ball-by-ball data is cleaned and mined to generate data containing player-vs-player performance attributes. Next, the various performance attributes for a player-vs-player combination is converted into a player’s performance rating by computing a weighted score of the performance attributes. Finally, an optimisation model is proposed and developed to determine the best playing-11 using the computed performance ratings. The developed optimisation model suggests the playing-11 that maximises the possibility of winning against a given opponent. The proposed procedure to determine the playing-11 for an IPL match is demonstrated using past data from 2008-20. The demonstration indicates that for matches in the league stage, the suggested playing-11 by model and the actual playing-11 have a ∼7% similarity across all teams. The remaining ∼3% are different from those selected in the actual team. Nevertheless, this difference approximately yields a ∼ Indian Premier League (IPL) is the most popular T20 domestic sporting league globally. Player selection is crucial in winning the competitive IPL tournament. Thus, team management select 11 players for each match from a team’s squad of 15 to 25 players. Different player statistics are analysed to select the best playing 11 for each match. This study attempts an approach where the on-field player performance is used to determine the playing-11. A player’s on-field performance in a match is computed as a single metric considering a player’s attributes against every player present in the opposition squad. For this computation, past ball-by-ball data is cleaned and mined to generate data containing player-vs-player performance attributes. Next, the various performance attributes for a player-vs-player combination is converted into a player’s performance rating by computing a weighted score of the performance attributes. Finally, an optimisation model is proposed and developed to determine the best playing-11 using the computed performance ratings. The developed optimisation model suggests the playing-11 that maximises the possibility of winning against a given opponent. The proposed procedure to determine the playing-11 for an IPL match is demonstrated using past data from 2008-20. The demonstration indicates that for matches in the league stage, the suggested playing-11 by model and the a","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135191239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Decision making is considered one of the most important aspects for winning a basketball game. In the final minutes of the game (clutch time), these decisions become even more crucial. In particular –who shall take the final, game-winning shots? While some coaches believe it is the team’s star, others may prefer the ‘clutch’ player (who seemingly performs better in clutch time), or the ‘hot’ player who was having a great game that night. In this work we study policy making in clutch minutes. Specifically, we introduce different policies for choosing the shot-taker (for example, according to field goal percentage). Then, we compare the policies and rank them to create a policy hierarchy, which serves as a decision guide for the coach. We show that when our recommendations are implemented (i.e., the highest ranked player takes the shot) the success rate is significantly greater: 51.2%, compared to 41.3% in commonly taken clutch shots. Furthermore, our results indicate that players who excelled in past clutch shots are more likely to succeed, independently to their performance in the current game.
{"title":"Decision making for basketball clutch shots: A data driven approach","authors":"Yuval Eppel, M. Kaspi, Amichai Painsky","doi":"10.3233/jsa-220682","DOIUrl":"https://doi.org/10.3233/jsa-220682","url":null,"abstract":"Decision making is considered one of the most important aspects for winning a basketball game. In the final minutes of the game (clutch time), these decisions become even more crucial. In particular –who shall take the final, game-winning shots? While some coaches believe it is the team’s star, others may prefer the ‘clutch’ player (who seemingly performs better in clutch time), or the ‘hot’ player who was having a great game that night. In this work we study policy making in clutch minutes. Specifically, we introduce different policies for choosing the shot-taker (for example, according to field goal percentage). Then, we compare the policies and rank them to create a policy hierarchy, which serves as a decision guide for the coach. We show that when our recommendations are implemented (i.e., the highest ranked player takes the shot) the success rate is significantly greater: 51.2%, compared to 41.3% in commonly taken clutch shots. Furthermore, our results indicate that players who excelled in past clutch shots are more likely to succeed, independently to their performance in the current game.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45427279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Volleyball Nations League is the elite annual international competition within volleyball, with the sixteen best nations per gender contesting the trophy in a tournament that spans over 6 weeks. The first five weeks contain a single round robin tournament, where matches are played in different venues across the globe. As a consequence, each team follows an intensive travel plan, where it happens quite often that there is a large discrepancy between travel burdens of opposing teams. This is considered a disadvantage for the team that travelled more. We analyse this problem, and find that it is closely related to the well-known Social Golfer Problem: we name the resulting problem the Traveling Social Golfer Problem (TSGP). We propose a decomposition approach for the TSGP, leading to the so-called Venue Assignment Problem and the Nation Assignment Problem. We prove that a solution to the Venue Assignment problem determines the amount of unfairness, and we also prove that any solution of the Venue Assignment problem can be extended to a solution to the Nation Assignment problem satisfying the so-called home-venue property. Using integer programming methods, we find, for real-life instances, the fairest schedules with respect to the difference in travel distance.
{"title":"How to schedule the Volleyball Nations League","authors":"R. Lambers, Laurent Rothuizen, F. Spieksma","doi":"10.3233/jsa-220626","DOIUrl":"https://doi.org/10.3233/jsa-220626","url":null,"abstract":"The Volleyball Nations League is the elite annual international competition within volleyball, with the sixteen best nations per gender contesting the trophy in a tournament that spans over 6 weeks. The first five weeks contain a single round robin tournament, where matches are played in different venues across the globe. As a consequence, each team follows an intensive travel plan, where it happens quite often that there is a large discrepancy between travel burdens of opposing teams. This is considered a disadvantage for the team that travelled more. We analyse this problem, and find that it is closely related to the well-known Social Golfer Problem: we name the resulting problem the Traveling Social Golfer Problem (TSGP). We propose a decomposition approach for the TSGP, leading to the so-called Venue Assignment Problem and the Nation Assignment Problem. We prove that a solution to the Venue Assignment problem determines the amount of unfairness, and we also prove that any solution of the Venue Assignment problem can be extended to a solution to the Nation Assignment problem satisfying the so-called home-venue property. Using integer programming methods, we find, for real-life instances, the fairest schedules with respect to the difference in travel distance.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42244597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Louis Weimer, Zachary C. Steinert-Threlkeld, K. Coltin
This paper provides new evidence that team-level momentum exists in the National Basketball Association (NBA). The existence of momentum is one of the most prominent and longstanding questions in sports analytics. But for all its importance to announcers, coaches, and players, existing literature has found little evidence of momentum in professional basketball. This paper exploits a natural experiment in the flow of basketball games: television (TV) timeouts. Since TV timeouts occur at points exogenous to momentum, they enable the measurement of the effect of pauses in the game separate from the effect of strategy changes. We find TV timeouts cause an 11.2% decline in the number of points that the team with momentum subsequently scores. This effect is robust to the size of a run, substitutions, and game context. This result has far reaching implications in basketball strategy and the understanding of momentum in sports more broadly.
{"title":"A causal approach for detecting team-level momentum in NBA games","authors":"Louis Weimer, Zachary C. Steinert-Threlkeld, K. Coltin","doi":"10.3233/jsa-220592","DOIUrl":"https://doi.org/10.3233/jsa-220592","url":null,"abstract":"This paper provides new evidence that team-level momentum exists in the National Basketball Association (NBA). The existence of momentum is one of the most prominent and longstanding questions in sports analytics. But for all its importance to announcers, coaches, and players, existing literature has found little evidence of momentum in professional basketball. This paper exploits a natural experiment in the flow of basketball games: television (TV) timeouts. Since TV timeouts occur at points exogenous to momentum, they enable the measurement of the effect of pauses in the game separate from the effect of strategy changes. We find TV timeouts cause an 11.2% decline in the number of points that the team with momentum subsequently scores. This effect is robust to the size of a run, substitutions, and game context. This result has far reaching implications in basketball strategy and the understanding of momentum in sports more broadly.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49668047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper is concerned with the modeling of run plays from data obtained from the NFL. Using a parametric regression model based on the skew–t distribution we estimate the shifts from overall league averages for each team within the NFL. From the interpretation of the parameters we can investigate what the best teams are specifically doing to achieve better performance according to the criterion of average yards per play.
{"title":"Parametric modeling and analysis of NFL run plays","authors":"Preston Biro, S. Walker","doi":"10.3233/jsa-220657","DOIUrl":"https://doi.org/10.3233/jsa-220657","url":null,"abstract":"The paper is concerned with the modeling of run plays from data obtained from the NFL. Using a parametric regression model based on the skew–t distribution we estimate the shifts from overall league averages for each team within the NFL. From the interpretation of the parameters we can investigate what the best teams are specifically doing to achieve better performance according to the criterion of average yards per play.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47826350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of this paper is to build and compare methods for the prediction of the final outcomes of basketball games. In this study, we analyzed data from four different European tournaments: Euroleague, Eurocup, Greek Basket League and Spanish Liga ACB. The data-set consists of information collected from box scores of 5214 games for the period of 2013-2018. The predictions obtained by our implemented methods and models were compared with a “vanilla” model using only the team-name information of each game. In our analysis, we have included new performance indicators constructed by using historical statistics, key performance indicators and measurements from three rating systems (Elo, PageRank, pi-rating). For these three rating systems and every tournament under consideration, we tune the rating system parameters using specific training data-sets. These new game features are improving our predictions efficiently and can be easily obtained in any basketball league. Our predictions were obtained by implementing three different statistics and machine learning algorithms: logistic regression, random forest, and extreme gradient boosting trees. Moreover, we report predictions based on the combination of these algorithms (ensemble learning). We evaluate our predictions using three predictive measures: Brier Score, accuracy and F 1-score. In addition, we evaluate the performance of our algorithms with three different prediction scenarios (full-season, mid-season, and play-offs predictive evaluation). For the mid-season and the play-offs scenarios, we further explore whether incorporating additional results from previous seasons in the learning data-set enhances the predictive performance of the implemented models and algorithms. Concerning the results, there is no clear winner between the machine learning algorithms since they provide identical predictions with small differences. However, models with predictors suggested in this paper out-perform the “vanilla” model by 3-5% in terms of accuracy. Another conclusion from our results for the play-offs scenarios is that it is not necessary to embed outcomes from previous seasons in our training data-set. Using data from the current season, most of the time, leads to efficient, accurate parameter learning and well-behaved prediction models. Moreover, the Greek league is the least balanced tournament in terms of competitiveness since all our models achieve high predictive accuracy (78%, on the best-performing model). The second less balanced league is the Spanish one with accuracy reaching 72% while for the two European tournaments the prediction accuracy is considerably lower (about 69% ). Finally, we present the most important features by counting the percentage of appearance in every machine learning algorithm for every one of the three analyses. From this analysis, we may conclude that the best predictors are the rating systems (pi-rating, PageRank, and ELO) and the current form performance indicators (e.g
{"title":"Predictions of european basketball match results with machine learning algorithms","authors":"Tzai Lampis, Ntzoufras Ioannis, Vassalos Vasilios, Dimitriou Stavrianna","doi":"10.3233/jsa-220639","DOIUrl":"https://doi.org/10.3233/jsa-220639","url":null,"abstract":"The goal of this paper is to build and compare methods for the prediction of the final outcomes of basketball games. In this study, we analyzed data from four different European tournaments: Euroleague, Eurocup, Greek Basket League and Spanish Liga ACB. The data-set consists of information collected from box scores of 5214 games for the period of 2013-2018. The predictions obtained by our implemented methods and models were compared with a “vanilla” model using only the team-name information of each game. In our analysis, we have included new performance indicators constructed by using historical statistics, key performance indicators and measurements from three rating systems (Elo, PageRank, pi-rating). For these three rating systems and every tournament under consideration, we tune the rating system parameters using specific training data-sets. These new game features are improving our predictions efficiently and can be easily obtained in any basketball league. Our predictions were obtained by implementing three different statistics and machine learning algorithms: logistic regression, random forest, and extreme gradient boosting trees. Moreover, we report predictions based on the combination of these algorithms (ensemble learning). We evaluate our predictions using three predictive measures: Brier Score, accuracy and F 1-score. In addition, we evaluate the performance of our algorithms with three different prediction scenarios (full-season, mid-season, and play-offs predictive evaluation). For the mid-season and the play-offs scenarios, we further explore whether incorporating additional results from previous seasons in the learning data-set enhances the predictive performance of the implemented models and algorithms. Concerning the results, there is no clear winner between the machine learning algorithms since they provide identical predictions with small differences. However, models with predictors suggested in this paper out-perform the “vanilla” model by 3-5% in terms of accuracy. Another conclusion from our results for the play-offs scenarios is that it is not necessary to embed outcomes from previous seasons in our training data-set. Using data from the current season, most of the time, leads to efficient, accurate parameter learning and well-behaved prediction models. Moreover, the Greek league is the least balanced tournament in terms of competitiveness since all our models achieve high predictive accuracy (78%, on the best-performing model). The second less balanced league is the Spanish one with accuracy reaching 72% while for the two European tournaments the prediction accuracy is considerably lower (about 69% ). Finally, we present the most important features by counting the percentage of appearance in every machine learning algorithm for every one of the three analyses. From this analysis, we may conclude that the best predictors are the rating systems (pi-rating, PageRank, and ELO) and the current form performance indicators (e.g","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45412242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scheduling factors such as a visiting team playing a game back-to-back against a rested home team can affect the win probability of the teams for that game and potentially affect teams unevenly throughout the season. This study examines schedule inequity in the National Basketball Association (NBA) for the seasons 2000–01 through 2018–19. By schedule inequity, we mean the effect of a comprehensive set of schedule factors, other than opponents, on team success and how much these effects differ across teams. We use a logistic regression model and Monte Carlo simulations to identify schedule factor variables that influence the probability of the home team winning in each game (the teams playing are control variables) and construct schedule inequity measures. We evaluate these measures for each NBA season, trends in the measures over time, and the potential effectiveness of broad prescriptive approaches to reduce schedule inequity. We find that, although schedule equity has improved over time, schedule differences disproportionately affect team success measures. Moreover, we find that balancing the frequency of schedule variables across teams is a more effective method of mitigating schedule inequity than reducing the total frequency, although combining both methods is the most effective strategy.
{"title":"Schedule inequity in the National Basketball Association","authors":"R. Alan Bowman, Oskar Harmon, Thomas Ashman","doi":"10.3233/jsa-220629","DOIUrl":"https://doi.org/10.3233/jsa-220629","url":null,"abstract":"Scheduling factors such as a visiting team playing a game back-to-back against a rested home team can affect the win probability of the teams for that game and potentially affect teams unevenly throughout the season. This study examines schedule inequity in the National Basketball Association (NBA) for the seasons 2000–01 through 2018–19. By schedule inequity, we mean the effect of a comprehensive set of schedule factors, other than opponents, on team success and how much these effects differ across teams. We use a logistic regression model and Monte Carlo simulations to identify schedule factor variables that influence the probability of the home team winning in each game (the teams playing are control variables) and construct schedule inequity measures. We evaluate these measures for each NBA season, trends in the measures over time, and the potential effectiveness of broad prescriptive approaches to reduce schedule inequity. We find that, although schedule equity has improved over time, schedule differences disproportionately affect team success measures. Moreover, we find that balancing the frequency of schedule variables across teams is a more effective method of mitigating schedule inequity than reducing the total frequency, although combining both methods is the most effective strategy.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136166051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Every year, there is a popular debate over how many teams should take part in the NCAA’s FBS-level college football championship tournament, and especially whether it should be expanded from 4 teams to 8 or even 12. The inherent tradeoff is that the larger the tournament, the higher the probability that the true best team is included (“validity”), but the lower the probability that the true best team will avoid being upset and win the tournament (“effectiveness”). Using simulation based on empiricially-derived estimates of the ability to measure true team quality and the amount of randomness inherent in each game, we show that the effect of expanding the tournament to 8 teams could be very small, an effectiveness decrease of only 2-3% while increasing validity by 1-4%, while a 7-team tournament provides slightly better tradeoffs. A 12-team tournament would decrease effectiveness by 5-6%.
{"title":"Randomness, Uncertainty, and the Optimal College Football Championship Tournament Size","authors":"Grace Muller, Samuel Hood, J. Sokol","doi":"10.3233/jsa-220613","DOIUrl":"https://doi.org/10.3233/jsa-220613","url":null,"abstract":"Every year, there is a popular debate over how many teams should take part in the NCAA’s FBS-level college football championship tournament, and especially whether it should be expanded from 4 teams to 8 or even 12. The inherent tradeoff is that the larger the tournament, the higher the probability that the true best team is included (“validity”), but the lower the probability that the true best team will avoid being upset and win the tournament (“effectiveness”). Using simulation based on empiricially-derived estimates of the ability to measure true team quality and the amount of randomness inherent in each game, we show that the effect of expanding the tournament to 8 teams could be very small, an effectiveness decrease of only 2-3% while increasing validity by 1-4%, while a 7-team tournament provides slightly better tradeoffs. A 12-team tournament would decrease effectiveness by 5-6%.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48112189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dani Chu, M. Tsai, Ryan Sheehan, Jack Davis, R. Doig
The pacing strategy adopted by athletes is a major determinants of success during timed competition. Various pacing profiles are reported in the literature and its importance depends on the mode of sport. However, in 2000 metre rowing, the definition of these pacing profiles has been limited by the minimal availability of data. Purpose: Our aim is to objectively identify pacing profiles used in World Championship 2000 metre rowing races using reproducible methods. Methods: We use the average speed for each 50 metre split for each available boat in every race of the Rowing World Championships from 2010-2017. This data was scraped from www.worldrowing.com. This data set is publicly available (https://github.com/danichusfu/rowing_pacing_profiles) to help the field of rowing research. Pacing profiles are determined by using k-shape clustering, a time series clustering method. A multinomial logistic regression is then fit to test whether variables such as boat size, gender, round, or rank are associated with pacing profiles. Results: Four pacing strategies (Even, Positive, Reverse J-Shaped, and U-Shaped) are identified from the clustering process. Boat size, round (Heat vs Finals), rank, gender, and weight class are all found to affect pacing profiles. Conclusion: We use an objective methodology with more granular data to identify four pacing strategies. We identify important associations between these pacing profiles and race factors. Finally, we make the full data set public to further rowing research and to replicate our results.
{"title":"Identifying Pacing Profiles in 2000 Metre World Championship Rowing","authors":"Dani Chu, M. Tsai, Ryan Sheehan, Jack Davis, R. Doig","doi":"10.3233/jsa-220497","DOIUrl":"https://doi.org/10.3233/jsa-220497","url":null,"abstract":"The pacing strategy adopted by athletes is a major determinants of success during timed competition. Various pacing profiles are reported in the literature and its importance depends on the mode of sport. However, in 2000 metre rowing, the definition of these pacing profiles has been limited by the minimal availability of data. Purpose: Our aim is to objectively identify pacing profiles used in World Championship 2000 metre rowing races using reproducible methods. Methods: We use the average speed for each 50 metre split for each available boat in every race of the Rowing World Championships from 2010-2017. This data was scraped from www.worldrowing.com. This data set is publicly available (https://github.com/danichusfu/rowing_pacing_profiles) to help the field of rowing research. Pacing profiles are determined by using k-shape clustering, a time series clustering method. A multinomial logistic regression is then fit to test whether variables such as boat size, gender, round, or rank are associated with pacing profiles. Results: Four pacing strategies (Even, Positive, Reverse J-Shaped, and U-Shaped) are identified from the clustering process. Boat size, round (Heat vs Finals), rank, gender, and weight class are all found to affect pacing profiles. Conclusion: We use an objective methodology with more granular data to identify four pacing strategies. We identify important associations between these pacing profiles and race factors. Finally, we make the full data set public to further rowing research and to replicate our results.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47209640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}