首页 > 最新文献

Journal of Sports Analytics最新文献

英文 中文
Community structure of the football transfer market network: the case of Italian Serie A 足球转会市场网络的共同体结构:以意甲联赛为例
Pub Date : 2023-11-09 DOI: 10.3233/jsa-220661
Lucio Palazzo, Roberto Rondinelli, Filipe Manuel Clemente, Riccardo Ievoli, Giancarlo Ragozini
The men’s football transfer market represents a complex phenomenon requiring suitable methods for an in-depth study. Network Analysis may be employed to measure the key elements of the transfer market through network indicators, such as degree centrality, hub and authority scores, and betweenness centrality. Furthermore, community detection methods can be proposed to unveil unobservable patterns of the football market, even considering auxiliary variables such as the type of transfer, the age or the role of the player, and the agents involved in the transfer flow. These methodologies are applied to the flows of player transfers generated by the 20 teams of the Italian first division (Serie A). These flows include teams from all over the world. We consider the summer market session of 2019, at the beginning of the season 2019-2020. Results also help to better understand some peculiarities of the Italian football transfer market in terms of the different approaches of the elite teams. Network indices show the presence of different market strategies, highlighting the role of mid-level teams such as Atalanta, Genoa, and Sassuolo. The network reveals a core-periphery structure splitted into several communities. The Infomap algorithm identifies 14 single team-based communities and three communities formed by two teams. Two of the latter are composed of a top team and a mid-level team, suggesting the presence of collaboration and similar market behavior, while the third is guided by two teams promoted by the second division (Serie B).
男子足球转会市场是一个复杂的现象,需要用合适的方法进行深入研究。网络分析可以通过网络指标来衡量转移市场的关键要素,如度中心性、枢纽和权威得分、中介中心性等。此外,可以提出社区检测方法来揭示足球市场不可观察的模式,甚至考虑诸如转会类型,球员的年龄或角色以及参与转会流程的经纪人等辅助变量。这些方法适用于意大利甲级联赛(意甲)的20支球队产生的球员转会流,这些流动包括来自世界各地的球队。我们考虑2019年夏季市场会议,即2019-2020赛季开始时。结果也有助于更好地理解意大利足球转会市场的一些特点,即精英球队的不同方法。网络指数显示了不同市场策略的存在,突出了亚特兰大、热那亚和萨索洛等中级球队的作用。该网络揭示了一个分裂成几个社区的核心-外围结构。Infomap算法确定了14个基于单个团队的社区和3个由两个团队组成的社区。后者中有两支球队是由一支顶级球队和一支中级球队组成的,这表明存在合作和类似的市场行为,而第三支球队则是由第二级别(乙级)晋升的两支球队主导的。
{"title":"Community structure of the football transfer market network: the case of Italian Serie A","authors":"Lucio Palazzo, Roberto Rondinelli, Filipe Manuel Clemente, Riccardo Ievoli, Giancarlo Ragozini","doi":"10.3233/jsa-220661","DOIUrl":"https://doi.org/10.3233/jsa-220661","url":null,"abstract":"The men’s football transfer market represents a complex phenomenon requiring suitable methods for an in-depth study. Network Analysis may be employed to measure the key elements of the transfer market through network indicators, such as degree centrality, hub and authority scores, and betweenness centrality. Furthermore, community detection methods can be proposed to unveil unobservable patterns of the football market, even considering auxiliary variables such as the type of transfer, the age or the role of the player, and the agents involved in the transfer flow. These methodologies are applied to the flows of player transfers generated by the 20 teams of the Italian first division (Serie A). These flows include teams from all over the world. We consider the summer market session of 2019, at the beginning of the season 2019-2020. Results also help to better understand some peculiarities of the Italian football transfer market in terms of the different approaches of the elite teams. Network indices show the presence of different market strategies, highlighting the role of mid-level teams such as Atalanta, Genoa, and Sassuolo. The network reveals a core-periphery structure splitted into several communities. The Infomap algorithm identifies 14 single team-based communities and three communities formed by two teams. Two of the latter are composed of a top team and a mid-level team, suggesting the presence of collaboration and similar market behavior, while the third is guided by two teams promoted by the second division (Serie B).","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135191392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Determining the playing 11 based on opposition squad: An IPL illustration 根据对手阵容确定11人的比赛:一个IPL的例子
Pub Date : 2023-11-09 DOI: 10.3233/jsa-220638
G. Gokul, Malolan Sundararaman
Indian Premier League (IPL) is the most popular T20 domestic sporting league globally. Player selection is crucial in winning the competitive IPL tournament. Thus, team management select 11 players for each match from a team’s squad of 15 to 25 players. Different player statistics are analysed to select the best playing 11 for each match. This study attempts an approach where the on-field player performance is used to determine the playing-11. A player’s on-field performance in a match is computed as a single metric considering a player’s attributes against every player present in the opposition squad. For this computation, past ball-by-ball data is cleaned and mined to generate data containing player-vs-player performance attributes. Next, the various performance attributes for a player-vs-player combination is converted into a player’s performance rating by computing a weighted score of the performance attributes. Finally, an optimisation model is proposed and developed to determine the best playing-11 using the computed performance ratings. The developed optimisation model suggests the playing-11 that maximises the possibility of winning against a given opponent. The proposed procedure to determine the playing-11 for an IPL match is demonstrated using past data from 2008-20. The demonstration indicates that for matches in the league stage, the suggested playing-11 by model and the actual playing-11 have a ∼7% similarity across all teams. The remaining ∼3% are different from those selected in the actual team. Nevertheless, this difference approximately yields a ∼ Indian Premier League (IPL) is the most popular T20 domestic sporting league globally. Player selection is crucial in winning the competitive IPL tournament. Thus, team management select 11 players for each match from a team’s squad of 15 to 25 players. Different player statistics are analysed to select the best playing 11 for each match. This study attempts an approach where the on-field player performance is used to determine the playing-11. A player’s on-field performance in a match is computed as a single metric considering a player’s attributes against every player present in the opposition squad. For this computation, past ball-by-ball data is cleaned and mined to generate data containing player-vs-player performance attributes. Next, the various performance attributes for a player-vs-player combination is converted into a player’s performance rating by computing a weighted score of the performance attributes. Finally, an optimisation model is proposed and developed to determine the best playing-11 using the computed performance ratings. The developed optimisation model suggests the playing-11 that maximises the possibility of winning against a given opponent. The proposed procedure to determine the playing-11 for an IPL match is demonstrated using past data from 2008-20. The demonstration indicates that for matches in the league stage, the suggested playing-11 by model and the a
印度超级联赛(IPL)是全球最受欢迎的T20国内体育联赛。球员的选择对于赢得竞争激烈的印度板球超级联赛至关重要。因此,球队管理层每场比赛从一支15到25人的球队中选择11名球员。分析不同球员的数据,为每场比赛选出最佳球员。本研究尝试了一种方法,其中球员在场上的表现,以确定发挥-11。一名球员在一场比赛中的场上表现是作为一个单一的指标来计算的,该指标考虑了一名球员与对手球队中所有球员的属性。对于这个计算,过去每个球的数据被清理和挖掘,以生成包含球员对球员性能属性的数据。接下来,通过计算性能属性的加权分数,将玩家对玩家组合的各种性能属性转换为玩家的性能评级。最后,提出并开发了一个优化模型,使用计算的性能评级来确定最佳的比赛-11。已开发的优化模型表明,对局的11种打法可以最大限度地提高战胜给定对手的可能性。采用2008- 2020年的过去数据演示了确定IPL比赛的11场比赛的拟议程序。该演示表明,对于联赛阶段的比赛,模型建议的11场比赛和实际的11场比赛在所有球队中具有~ 7%的相似性。剩下的~ 3%与实际选拔的人员不同。然而,这种差异大约导致印度超级联赛(IPL)成为全球最受欢迎的T20国内体育联赛。球员的选择对于赢得竞争激烈的印度板球超级联赛至关重要。因此,球队管理层每场比赛从一支15到25人的球队中选择11名球员。分析不同球员的数据,为每场比赛选出最佳球员。本研究尝试了一种方法,其中球员在场上的表现,以确定发挥-11。一名球员在一场比赛中的场上表现是作为一个单一的指标来计算的,该指标考虑了一名球员与对手球队中所有球员的属性。对于这个计算,过去每个球的数据被清理和挖掘,以生成包含球员对球员性能属性的数据。接下来,通过计算性能属性的加权分数,将玩家对玩家组合的各种性能属性转换为玩家的性能评级。最后,提出并开发了一个优化模型,使用计算的性能评级来确定最佳的比赛-11。已开发的优化模型表明,对局的11种打法可以最大限度地提高战胜给定对手的可能性。采用2008- 2020年的过去数据演示了确定IPL比赛的11场比赛的拟议程序。该演示表明,对于联赛阶段的比赛,模型建议的11场比赛和实际的11场比赛在所有球队中具有~ 7%的相似性。剩下的~ 3%与实际选拔的人员不同。尽管如此,这种差异与现有团队相比,绩效评级大约提高了~ 13.32%。
{"title":"Determining the playing 11 based on opposition squad: An IPL illustration","authors":"G. Gokul, Malolan Sundararaman","doi":"10.3233/jsa-220638","DOIUrl":"https://doi.org/10.3233/jsa-220638","url":null,"abstract":"Indian Premier League (IPL) is the most popular T20 domestic sporting league globally. Player selection is crucial in winning the competitive IPL tournament. Thus, team management select 11 players for each match from a team’s squad of 15 to 25 players. Different player statistics are analysed to select the best playing 11 for each match. This study attempts an approach where the on-field player performance is used to determine the playing-11. A player’s on-field performance in a match is computed as a single metric considering a player’s attributes against every player present in the opposition squad. For this computation, past ball-by-ball data is cleaned and mined to generate data containing player-vs-player performance attributes. Next, the various performance attributes for a player-vs-player combination is converted into a player’s performance rating by computing a weighted score of the performance attributes. Finally, an optimisation model is proposed and developed to determine the best playing-11 using the computed performance ratings. The developed optimisation model suggests the playing-11 that maximises the possibility of winning against a given opponent. The proposed procedure to determine the playing-11 for an IPL match is demonstrated using past data from 2008-20. The demonstration indicates that for matches in the league stage, the suggested playing-11 by model and the actual playing-11 have a ∼7% similarity across all teams. The remaining ∼3% are different from those selected in the actual team. Nevertheless, this difference approximately yields a ∼ Indian Premier League (IPL) is the most popular T20 domestic sporting league globally. Player selection is crucial in winning the competitive IPL tournament. Thus, team management select 11 players for each match from a team’s squad of 15 to 25 players. Different player statistics are analysed to select the best playing 11 for each match. This study attempts an approach where the on-field player performance is used to determine the playing-11. A player’s on-field performance in a match is computed as a single metric considering a player’s attributes against every player present in the opposition squad. For this computation, past ball-by-ball data is cleaned and mined to generate data containing player-vs-player performance attributes. Next, the various performance attributes for a player-vs-player combination is converted into a player’s performance rating by computing a weighted score of the performance attributes. Finally, an optimisation model is proposed and developed to determine the best playing-11 using the computed performance ratings. The developed optimisation model suggests the playing-11 that maximises the possibility of winning against a given opponent. The proposed procedure to determine the playing-11 for an IPL match is demonstrated using past data from 2008-20. The demonstration indicates that for matches in the league stage, the suggested playing-11 by model and the a","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135191239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decision making for basketball clutch shots: A data driven approach 篮球关键投篮决策:一种数据驱动的方法
IF 1.1 Pub Date : 2023-08-23 DOI: 10.3233/jsa-220682
Yuval Eppel, M. Kaspi, Amichai Painsky
Decision making is considered one of the most important aspects for winning a basketball game. In the final minutes of the game (clutch time), these decisions become even more crucial. In particular –who shall take the final, game-winning shots? While some coaches believe it is the team’s star, others may prefer the ‘clutch’ player (who seemingly performs better in clutch time), or the ‘hot’ player who was having a great game that night. In this work we study policy making in clutch minutes. Specifically, we introduce different policies for choosing the shot-taker (for example, according to field goal percentage). Then, we compare the policies and rank them to create a policy hierarchy, which serves as a decision guide for the coach. We show that when our recommendations are implemented (i.e., the highest ranked player takes the shot) the success rate is significantly greater: 51.2%, compared to 41.3% in commonly taken clutch shots. Furthermore, our results indicate that players who excelled in past clutch shots are more likely to succeed, independently to their performance in the current game.
决策被认为是赢得篮球比赛最重要的方面之一。在比赛的最后几分钟(关键时刻),这些决定变得更加关键。特别是,谁将在最后赢得比赛?虽然一些教练认为这是球队的明星,但其他人可能更喜欢“关键”球员(他似乎在关键时刻表现更好),或者当晚表现出色的“热门”球员。在这项工作中,我们研究了关键时刻的政策制定。具体来说,我们引入了不同的策略来选择击球手(例如,根据投篮命中率)。然后,我们比较策略并对其进行排序,以创建一个策略层次结构,作为教练的决策指南。我们发现,当我们的建议得到实施时(即,排名最高的球员投篮),成功率明显更高:51.2%,而通常的关键投篮成功率为41.3%。此外,我们的研究结果表明,在过去的关键投篮中表现出色的球员更有可能成功,这与他们在当前比赛中的表现无关。
{"title":"Decision making for basketball clutch shots: A data driven approach","authors":"Yuval Eppel, M. Kaspi, Amichai Painsky","doi":"10.3233/jsa-220682","DOIUrl":"https://doi.org/10.3233/jsa-220682","url":null,"abstract":"Decision making is considered one of the most important aspects for winning a basketball game. In the final minutes of the game (clutch time), these decisions become even more crucial. In particular –who shall take the final, game-winning shots? While some coaches believe it is the team’s star, others may prefer the ‘clutch’ player (who seemingly performs better in clutch time), or the ‘hot’ player who was having a great game that night. In this work we study policy making in clutch minutes. Specifically, we introduce different policies for choosing the shot-taker (for example, according to field goal percentage). Then, we compare the policies and rank them to create a policy hierarchy, which serves as a decision guide for the coach. We show that when our recommendations are implemented (i.e., the highest ranked player takes the shot) the success rate is significantly greater: 51.2%, compared to 41.3% in commonly taken clutch shots. Furthermore, our results indicate that players who excelled in past clutch shots are more likely to succeed, independently to their performance in the current game.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45427279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to schedule the Volleyball Nations League 如何安排国际排球联赛
IF 1.1 Pub Date : 2023-07-03 DOI: 10.3233/jsa-220626
R. Lambers, Laurent Rothuizen, F. Spieksma
The Volleyball Nations League is the elite annual international competition within volleyball, with the sixteen best nations per gender contesting the trophy in a tournament that spans over 6 weeks. The first five weeks contain a single round robin tournament, where matches are played in different venues across the globe. As a consequence, each team follows an intensive travel plan, where it happens quite often that there is a large discrepancy between travel burdens of opposing teams. This is considered a disadvantage for the team that travelled more. We analyse this problem, and find that it is closely related to the well-known Social Golfer Problem: we name the resulting problem the Traveling Social Golfer Problem (TSGP). We propose a decomposition approach for the TSGP, leading to the so-called Venue Assignment Problem and the Nation Assignment Problem. We prove that a solution to the Venue Assignment problem determines the amount of unfairness, and we also prove that any solution of the Venue Assignment problem can be extended to a solution to the Nation Assignment problem satisfying the so-called home-venue property. Using integer programming methods, we find, for real-life instances, the fairest schedules with respect to the difference in travel distance.
排球国家联盟是排球界一年一度的精英国际比赛,每种性别的16个最佳国家将在为期6周的比赛中角逐奖杯。前五周是单循环赛,比赛在全球不同的场地进行。因此,每支球队都遵循一个密集的旅行计划,在这种情况下,对方球队的旅行负担往往存在很大差异。这被认为是旅行次数较多的球队的劣势。我们分析了这个问题,发现它与众所周知的社会高尔夫问题密切相关:我们将由此产生的问题命名为旅行社会高尔夫问题(TSGP)。我们提出了TSGP的分解方法,导致了所谓的场地分配问题和国家分配问题。我们证明了场地分配问题的解决方案决定了不公平的程度,我们还证明了场地指定问题的任何解决方案都可以扩展到满足所谓主场财产的国家分配问题的解。使用整数规划方法,我们发现,在现实生活中,相对于旅行距离的差异,最公平的时间表。
{"title":"How to schedule the Volleyball Nations League","authors":"R. Lambers, Laurent Rothuizen, F. Spieksma","doi":"10.3233/jsa-220626","DOIUrl":"https://doi.org/10.3233/jsa-220626","url":null,"abstract":"The Volleyball Nations League is the elite annual international competition within volleyball, with the sixteen best nations per gender contesting the trophy in a tournament that spans over 6 weeks. The first five weeks contain a single round robin tournament, where matches are played in different venues across the globe. As a consequence, each team follows an intensive travel plan, where it happens quite often that there is a large discrepancy between travel burdens of opposing teams. This is considered a disadvantage for the team that travelled more. We analyse this problem, and find that it is closely related to the well-known Social Golfer Problem: we name the resulting problem the Traveling Social Golfer Problem (TSGP). We propose a decomposition approach for the TSGP, leading to the so-called Venue Assignment Problem and the Nation Assignment Problem. We prove that a solution to the Venue Assignment problem determines the amount of unfairness, and we also prove that any solution of the Venue Assignment problem can be extended to a solution to the Nation Assignment problem satisfying the so-called home-venue property. Using integer programming methods, we find, for real-life instances, the fairest schedules with respect to the difference in travel distance.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42244597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A causal approach for detecting team-level momentum in NBA games NBA比赛中团队水平动量检测的因果方法
IF 1.1 Pub Date : 2023-07-03 DOI: 10.3233/jsa-220592
Louis Weimer, Zachary C. Steinert-Threlkeld, K. Coltin
This paper provides new evidence that team-level momentum exists in the National Basketball Association (NBA). The existence of momentum is one of the most prominent and longstanding questions in sports analytics. But for all its importance to announcers, coaches, and players, existing literature has found little evidence of momentum in professional basketball. This paper exploits a natural experiment in the flow of basketball games: television (TV) timeouts. Since TV timeouts occur at points exogenous to momentum, they enable the measurement of the effect of pauses in the game separate from the effect of strategy changes. We find TV timeouts cause an 11.2% decline in the number of points that the team with momentum subsequently scores. This effect is robust to the size of a run, substitutions, and game context. This result has far reaching implications in basketball strategy and the understanding of momentum in sports more broadly.
本文为美国国家篮球协会(NBA)存在团队层面的动力提供了新的证据。动量的存在是体育分析中最突出和最长期存在的问题之一。尽管它对播音员、教练和球员都很重要,但现有文献几乎没有发现职业篮球势头的证据。本文利用了篮球比赛流程中的一个自然实验:电视暂停。由于电视暂停发生在动量的外生点,因此可以独立于策略变化的影响来衡量比赛暂停的影响。我们发现,电视暂停会导致势头强劲的球队随后得分下降11.2%。这种效果对于跑动、换人和比赛环境的大小都是稳健的。这一结果对篮球策略和更广泛地理解体育运动中的动量有着深远的影响。
{"title":"A causal approach for detecting team-level momentum in NBA games","authors":"Louis Weimer, Zachary C. Steinert-Threlkeld, K. Coltin","doi":"10.3233/jsa-220592","DOIUrl":"https://doi.org/10.3233/jsa-220592","url":null,"abstract":"This paper provides new evidence that team-level momentum exists in the National Basketball Association (NBA). The existence of momentum is one of the most prominent and longstanding questions in sports analytics. But for all its importance to announcers, coaches, and players, existing literature has found little evidence of momentum in professional basketball. This paper exploits a natural experiment in the flow of basketball games: television (TV) timeouts. Since TV timeouts occur at points exogenous to momentum, they enable the measurement of the effect of pauses in the game separate from the effect of strategy changes. We find TV timeouts cause an 11.2% decline in the number of points that the team with momentum subsequently scores. This effect is robust to the size of a run, substitutions, and game context. This result has far reaching implications in basketball strategy and the understanding of momentum in sports more broadly.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49668047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parametric modeling and analysis of NFL run plays NFL跑动战术的参数化建模与分析
IF 1.1 Pub Date : 2023-06-12 DOI: 10.3233/jsa-220657
Preston Biro, S. Walker
The paper is concerned with the modeling of run plays from data obtained from the NFL. Using a parametric regression model based on the skew–t distribution we estimate the shifts from overall league averages for each team within the NFL. From the interpretation of the parameters we can investigate what the best teams are specifically doing to achieve better performance according to the criterion of average yards per play.
本文关注的是从美国国家橄榄球联盟获得的数据对跑动比赛进行建模。使用基于偏t分布的参数回归模型,我们估计了NFL内每支球队的总体联盟平均水平的变化。根据对参数的解释,我们可以根据平均每场比赛码数的标准来研究最好的球队具体做了什么来获得更好的表现。
{"title":"Parametric modeling and analysis of NFL run plays","authors":"Preston Biro, S. Walker","doi":"10.3233/jsa-220657","DOIUrl":"https://doi.org/10.3233/jsa-220657","url":null,"abstract":"The paper is concerned with the modeling of run plays from data obtained from the NFL. Using a parametric regression model based on the skew–t distribution we estimate the shifts from overall league averages for each team within the NFL. From the interpretation of the parameters we can investigate what the best teams are specifically doing to achieve better performance according to the criterion of average yards per play.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47826350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictions of european basketball match results with machine learning algorithms 用机器学习算法预测欧洲篮球比赛结果
IF 1.1 Pub Date : 2023-03-31 DOI: 10.3233/jsa-220639
Tzai Lampis, Ntzoufras Ioannis, Vassalos Vasilios, Dimitriou Stavrianna
The goal of this paper is to build and compare methods for the prediction of the final outcomes of basketball games. In this study, we analyzed data from four different European tournaments: Euroleague, Eurocup, Greek Basket League and Spanish Liga ACB. The data-set consists of information collected from box scores of 5214 games for the period of 2013-2018. The predictions obtained by our implemented methods and models were compared with a “vanilla” model using only the team-name information of each game. In our analysis, we have included new performance indicators constructed by using historical statistics, key performance indicators and measurements from three rating systems (Elo, PageRank, pi-rating). For these three rating systems and every tournament under consideration, we tune the rating system parameters using specific training data-sets. These new game features are improving our predictions efficiently and can be easily obtained in any basketball league. Our predictions were obtained by implementing three different statistics and machine learning algorithms: logistic regression, random forest, and extreme gradient boosting trees. Moreover, we report predictions based on the combination of these algorithms (ensemble learning). We evaluate our predictions using three predictive measures: Brier Score, accuracy and F 1-score. In addition, we evaluate the performance of our algorithms with three different prediction scenarios (full-season, mid-season, and play-offs predictive evaluation). For the mid-season and the play-offs scenarios, we further explore whether incorporating additional results from previous seasons in the learning data-set enhances the predictive performance of the implemented models and algorithms. Concerning the results, there is no clear winner between the machine learning algorithms since they provide identical predictions with small differences. However, models with predictors suggested in this paper out-perform the “vanilla” model by 3-5% in terms of accuracy. Another conclusion from our results for the play-offs scenarios is that it is not necessary to embed outcomes from previous seasons in our training data-set. Using data from the current season, most of the time, leads to efficient, accurate parameter learning and well-behaved prediction models. Moreover, the Greek league is the least balanced tournament in terms of competitiveness since all our models achieve high predictive accuracy (78%, on the best-performing model). The second less balanced league is the Spanish one with accuracy reaching 72% while for the two European tournaments the prediction accuracy is considerably lower (about 69% ). Finally, we present the most important features by counting the percentage of appearance in every machine learning algorithm for every one of the three analyses. From this analysis, we may conclude that the best predictors are the rating systems (pi-rating, PageRank, and ELO) and the current form performance indicators (e.g
本文的目的是建立和比较预测篮球比赛最终结果的方法。在这项研究中,我们分析了四项不同的欧洲锦标赛的数据:欧洲联赛、欧洲杯、希腊篮球联赛和西甲ACB。该数据集由2013-2018年期间5214场比赛的盒子比分信息组成。通过我们实现的方法和模型获得的预测结果与仅使用每场比赛的球队名称信息的“香草”模型进行了比较。在我们的分析中,我们纳入了通过使用历史统计数据、关键绩效指标和来自三个评级系统(Elo、PageRank、pi-rating)的测量来构建的新绩效指标。对于这三种评级系统和考虑中的每一场比赛,我们使用特定的训练数据集来调整评级系统参数。这些新的比赛特征有效地改善了我们的预测,并且可以很容易地在任何篮球联赛中获得。我们的预测是通过实现三种不同的统计和机器学习算法获得的:逻辑回归、随机森林和极端梯度增强树。此外,我们报告了基于这些算法组合的预测(集成学习)。我们使用三个预测指标来评估我们的预测:Brier评分、准确性和f1评分。此外,我们用三种不同的预测场景(全赛季、赛季中期和附加赛预测评估)来评估我们的算法的性能。对于赛季中期和附加赛的场景,我们进一步探讨了在学习数据集中加入前几个赛季的额外结果是否会增强所实现模型和算法的预测性能。关于结果,机器学习算法之间没有明显的赢家,因为它们提供了相同的预测,但差异很小。然而,本文中提出的带有预测因子的模型在准确性方面比“香草”模型高出3-5%。从季后赛场景的结果中得出的另一个结论是,没有必要将前几个赛季的结果嵌入到我们的训练数据集中。大多数情况下,使用当前季节的数据可以获得高效、准确的参数学习和性能良好的预测模型。此外,就竞争力而言,希腊联赛是最不平衡的比赛,因为我们所有的模型都达到了很高的预测准确率(在表现最好的模型上为78%)。第二个不太平衡的联赛是西班牙联赛,准确率达到72%,而两个欧洲锦标赛的预测准确率要低得多(约69%)。最后,我们通过计算三种分析中的每一种机器学习算法的外观百分比来呈现最重要的特征。从这个分析中,我们可以得出结论,最好的预测指标是评级系统(pi-rating, PageRank和ELO)和当前形式的表现指标(例如,最常见的两个是霍林格的比赛得分和地板冲击计数器)。
{"title":"Predictions of european basketball match results with machine learning algorithms","authors":"Tzai Lampis, Ntzoufras Ioannis, Vassalos Vasilios, Dimitriou Stavrianna","doi":"10.3233/jsa-220639","DOIUrl":"https://doi.org/10.3233/jsa-220639","url":null,"abstract":"The goal of this paper is to build and compare methods for the prediction of the final outcomes of basketball games. In this study, we analyzed data from four different European tournaments: Euroleague, Eurocup, Greek Basket League and Spanish Liga ACB. The data-set consists of information collected from box scores of 5214 games for the period of 2013-2018. The predictions obtained by our implemented methods and models were compared with a “vanilla” model using only the team-name information of each game. In our analysis, we have included new performance indicators constructed by using historical statistics, key performance indicators and measurements from three rating systems (Elo, PageRank, pi-rating). For these three rating systems and every tournament under consideration, we tune the rating system parameters using specific training data-sets. These new game features are improving our predictions efficiently and can be easily obtained in any basketball league. Our predictions were obtained by implementing three different statistics and machine learning algorithms: logistic regression, random forest, and extreme gradient boosting trees. Moreover, we report predictions based on the combination of these algorithms (ensemble learning). We evaluate our predictions using three predictive measures: Brier Score, accuracy and F 1-score. In addition, we evaluate the performance of our algorithms with three different prediction scenarios (full-season, mid-season, and play-offs predictive evaluation). For the mid-season and the play-offs scenarios, we further explore whether incorporating additional results from previous seasons in the learning data-set enhances the predictive performance of the implemented models and algorithms. Concerning the results, there is no clear winner between the machine learning algorithms since they provide identical predictions with small differences. However, models with predictors suggested in this paper out-perform the “vanilla” model by 3-5% in terms of accuracy. Another conclusion from our results for the play-offs scenarios is that it is not necessary to embed outcomes from previous seasons in our training data-set. Using data from the current season, most of the time, leads to efficient, accurate parameter learning and well-behaved prediction models. Moreover, the Greek league is the least balanced tournament in terms of competitiveness since all our models achieve high predictive accuracy (78%, on the best-performing model). The second less balanced league is the Spanish one with accuracy reaching 72% while for the two European tournaments the prediction accuracy is considerably lower (about 69% ). Finally, we present the most important features by counting the percentage of appearance in every machine learning algorithm for every one of the three analyses. From this analysis, we may conclude that the best predictors are the rating systems (pi-rating, PageRank, and ELO) and the current form performance indicators (e.g","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45412242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Schedule inequity in the National Basketball Association nba赛程的不公平
Pub Date : 2023-03-23 DOI: 10.3233/jsa-220629
R. Alan Bowman, Oskar Harmon, Thomas Ashman
Scheduling factors such as a visiting team playing a game back-to-back against a rested home team can affect the win probability of the teams for that game and potentially affect teams unevenly throughout the season. This study examines schedule inequity in the National Basketball Association (NBA) for the seasons 2000–01 through 2018–19. By schedule inequity, we mean the effect of a comprehensive set of schedule factors, other than opponents, on team success and how much these effects differ across teams. We use a logistic regression model and Monte Carlo simulations to identify schedule factor variables that influence the probability of the home team winning in each game (the teams playing are control variables) and construct schedule inequity measures. We evaluate these measures for each NBA season, trends in the measures over time, and the potential effectiveness of broad prescriptive approaches to reduce schedule inequity. We find that, although schedule equity has improved over time, schedule differences disproportionately affect team success measures. Moreover, we find that balancing the frequency of schedule variables across teams is a more effective method of mitigating schedule inequity than reducing the total frequency, although combining both methods is the most effective strategy.
日程安排方面的因素,如客队背靠背与休息的主队比赛,会影响球队在那场比赛中的获胜概率,并可能在整个赛季中对球队产生不均衡的影响。本研究调查了2000-01赛季至2018-19赛季NBA赛程的不平等。所谓赛程不平等,我们指的是除了对手之外的一系列赛程因素对球队成功的影响,以及这些影响在不同球队之间的差异。我们使用逻辑回归模型和蒙特卡罗模拟来识别影响主队在每场比赛中获胜概率的赛程因素变量(参加比赛的球队是控制变量),并构建赛程不公平度量。我们评估了每个NBA赛季的这些指标,随着时间的推移,这些指标的趋势,以及减少赛程不平等的广泛规范方法的潜在有效性。我们发现,尽管时间安排的公平性随着时间的推移而改善,但时间安排的差异不成比例地影响团队成功的衡量标准。此外,我们发现平衡跨团队的进度变量的频率比减少总频率是一种更有效的减轻进度不平等的方法,尽管结合这两种方法是最有效的策略。
{"title":"Schedule inequity in the National Basketball Association","authors":"R. Alan Bowman, Oskar Harmon, Thomas Ashman","doi":"10.3233/jsa-220629","DOIUrl":"https://doi.org/10.3233/jsa-220629","url":null,"abstract":"Scheduling factors such as a visiting team playing a game back-to-back against a rested home team can affect the win probability of the teams for that game and potentially affect teams unevenly throughout the season. This study examines schedule inequity in the National Basketball Association (NBA) for the seasons 2000–01 through 2018–19. By schedule inequity, we mean the effect of a comprehensive set of schedule factors, other than opponents, on team success and how much these effects differ across teams. We use a logistic regression model and Monte Carlo simulations to identify schedule factor variables that influence the probability of the home team winning in each game (the teams playing are control variables) and construct schedule inequity measures. We evaluate these measures for each NBA season, trends in the measures over time, and the potential effectiveness of broad prescriptive approaches to reduce schedule inequity. We find that, although schedule equity has improved over time, schedule differences disproportionately affect team success measures. Moreover, we find that balancing the frequency of schedule variables across teams is a more effective method of mitigating schedule inequity than reducing the total frequency, although combining both methods is the most effective strategy.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136166051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Randomness, Uncertainty, and the Optimal College Football Championship Tournament Size 随机性、不确定性与最佳大学橄榄球锦标赛规模
IF 1.1 Pub Date : 2023-03-04 DOI: 10.3233/jsa-220613
Grace Muller, Samuel Hood, J. Sokol
Every year, there is a popular debate over how many teams should take part in the NCAA’s FBS-level college football championship tournament, and especially whether it should be expanded from 4 teams to 8 or even 12. The inherent tradeoff is that the larger the tournament, the higher the probability that the true best team is included (“validity”), but the lower the probability that the true best team will avoid being upset and win the tournament (“effectiveness”). Using simulation based on empiricially-derived estimates of the ability to measure true team quality and the amount of randomness inherent in each game, we show that the effect of expanding the tournament to 8 teams could be very small, an effectiveness decrease of only 2-3% while increasing validity by 1-4%, while a 7-team tournament provides slightly better tradeoffs. A 12-team tournament would decrease effectiveness by 5-6%.
每年,关于应该有多少支球队参加NCAA的fbs级别的大学橄榄球锦标赛,尤其是是否应该从4支球队扩大到8支甚至12支球队,都有一个流行的争论。固有的权衡是,比赛规模越大,真正最好的球队入选的可能性就越高(“有效性”),但真正最好的球队避免被淘汰并赢得比赛的可能性就越低(“有效性”)。通过对衡量真实团队质量和每场比赛中固有随机性的能力的经验估计进行模拟,我们发现将比赛扩大到8支球队的效果可能非常小,有效性仅下降2-3%,而有效性增加1-4%,而7支球队的比赛提供了稍好的权衡。12支球队的比赛将会降低5-6%的效率。
{"title":"Randomness, Uncertainty, and the Optimal College Football Championship Tournament Size","authors":"Grace Muller, Samuel Hood, J. Sokol","doi":"10.3233/jsa-220613","DOIUrl":"https://doi.org/10.3233/jsa-220613","url":null,"abstract":"Every year, there is a popular debate over how many teams should take part in the NCAA’s FBS-level college football championship tournament, and especially whether it should be expanded from 4 teams to 8 or even 12. The inherent tradeoff is that the larger the tournament, the higher the probability that the true best team is included (“validity”), but the lower the probability that the true best team will avoid being upset and win the tournament (“effectiveness”). Using simulation based on empiricially-derived estimates of the ability to measure true team quality and the amount of randomness inherent in each game, we show that the effect of expanding the tournament to 8 teams could be very small, an effectiveness decrease of only 2-3% while increasing validity by 1-4%, while a 7-team tournament provides slightly better tradeoffs. A 12-team tournament would decrease effectiveness by 5-6%.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48112189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Pacing Profiles in 2000 Metre World Championship Rowing 2000年世界米赛艇锦标赛节奏特征分析
IF 1.1 Pub Date : 2023-03-04 DOI: 10.3233/jsa-220497
Dani Chu, M. Tsai, Ryan Sheehan, Jack Davis, R. Doig
The pacing strategy adopted by athletes is a major determinants of success during timed competition. Various pacing profiles are reported in the literature and its importance depends on the mode of sport. However, in 2000 metre rowing, the definition of these pacing profiles has been limited by the minimal availability of data. Purpose: Our aim is to objectively identify pacing profiles used in World Championship 2000 metre rowing races using reproducible methods. Methods: We use the average speed for each 50 metre split for each available boat in every race of the Rowing World Championships from 2010-2017. This data was scraped from www.worldrowing.com. This data set is publicly available (https://github.com/danichusfu/rowing_pacing_profiles) to help the field of rowing research. Pacing profiles are determined by using k-shape clustering, a time series clustering method. A multinomial logistic regression is then fit to test whether variables such as boat size, gender, round, or rank are associated with pacing profiles. Results: Four pacing strategies (Even, Positive, Reverse J-Shaped, and U-Shaped) are identified from the clustering process. Boat size, round (Heat vs Finals), rank, gender, and weight class are all found to affect pacing profiles. Conclusion: We use an objective methodology with more granular data to identify four pacing strategies. We identify important associations between these pacing profiles and race factors. Finally, we make the full data set public to further rowing research and to replicate our results.
在计时赛中,运动员采用的节奏策略是成功的主要决定因素。文献中报道了各种起搏情况,其重要性取决于运动模式。然而,在2000米赛艇比赛中,这些起搏曲线的定义受到数据可用性最低的限制。目的:我们的目的是使用可重复的方法,客观地确定世界锦标赛2000米赛艇比赛中使用的起搏曲线。方法:我们使用2010-2017年世界赛艇锦标赛每场比赛中每艘可用船只每50米的平均速度。这些数据是从www.worldrowing.com上截取的。这些数据集是公开的(https://github.com/danichusfu/rowing_pacing_profiles)以帮助赛艇研究领域。起搏剖面是通过使用k形聚类(一种时间序列聚类方法)来确定的。然后拟合多项式逻辑回归来测试船只大小、性别、轮数或级别等变量是否与起搏特征相关。结果:从聚类过程中识别出四种起搏策略(偶数、正、反向J形和U形)。船的大小、轮数(热火队vs总决赛)、级别、性别和体重等级都会影响节奏。结论:我们使用一种具有更细粒度数据的客观方法来确定四种起搏策略。我们确定了这些起搏特征和种族因素之间的重要关联。最后,我们公开了完整的数据集,以进一步进行赛艇研究并复制我们的结果。
{"title":"Identifying Pacing Profiles in 2000 Metre World Championship Rowing","authors":"Dani Chu, M. Tsai, Ryan Sheehan, Jack Davis, R. Doig","doi":"10.3233/jsa-220497","DOIUrl":"https://doi.org/10.3233/jsa-220497","url":null,"abstract":"The pacing strategy adopted by athletes is a major determinants of success during timed competition. Various pacing profiles are reported in the literature and its importance depends on the mode of sport. However, in 2000 metre rowing, the definition of these pacing profiles has been limited by the minimal availability of data. Purpose: Our aim is to objectively identify pacing profiles used in World Championship 2000 metre rowing races using reproducible methods. Methods: We use the average speed for each 50 metre split for each available boat in every race of the Rowing World Championships from 2010-2017. This data was scraped from www.worldrowing.com. This data set is publicly available (https://github.com/danichusfu/rowing_pacing_profiles) to help the field of rowing research. Pacing profiles are determined by using k-shape clustering, a time series clustering method. A multinomial logistic regression is then fit to test whether variables such as boat size, gender, round, or rank are associated with pacing profiles. Results: Four pacing strategies (Even, Positive, Reverse J-Shaped, and U-Shaped) are identified from the clustering process. Boat size, round (Heat vs Finals), rank, gender, and weight class are all found to affect pacing profiles. Conclusion: We use an objective methodology with more granular data to identify four pacing strategies. We identify important associations between these pacing profiles and race factors. Finally, we make the full data set public to further rowing research and to replicate our results.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47209640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Sports Analytics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1