首页 > 最新文献

Journal of Sports Analytics最新文献

英文 中文
Population-adjusted national rankings in the Olympics 经人口调整后的奥运会国家排名
IF 0.6 Q4 HOSPITALITY, LEISURE, SPORT & TOURISM Pub Date : 2024-07-12 DOI: 10.3233/jsa-240874
Robert C. Duncan, Andrew Parece
Ranking countries in the Olympic Games by medal counts clearly favors large-population countries over small ones, while ranking by medals-per-capita produces national rankings with very small population countries on top. We discuss why this happens, and propose a new national ranking system for the Olympics, also based upon medals won, which is inclusive in the sense that countries of widely-varying population can achieve high rankings. This population-adjusted probability ranking ranks countries by how much evidence they show for high capability at Olympic sports. In particular, it ranks countries according to how improbable their medal counts would be in an idealized reference model of the Games which posits that all medal-winning nations have equal propensity per capita for winning medals. The ranking index U is defined using a simple binomial sum. Here we explain the method, and we present population-adjusted national rankings for the last three summer Olympics (London 2012, Rio 2016 and Tokyo 2020, held in 2021). If the advantages of this ranking method come to be understood by sports media covering the Olympics and by the interested public, it could be widely reported alongside raw medal counts, thus adding excitement and interest to the Olympics.
在奥运会上,按奖牌数对国家进行排名显然有利于人口大国而非人口小国,而按人均奖牌数进行排名则会产生人口极少的国家排名靠前。我们讨论了出现这种情况的原因,并提出了一种新的奥运国家排名系统,同样以奖牌数为基础,具有包容性,即不同人口的国家都能获得较高排名。这种经过人口调整的概率排名是根据各国在奥林匹克运动中表现出的高能力的程度来进行排名的。特别是,它根据各国在理想化的奥运会参考模型中获得奖牌数的不可能程度进行排名,该模型假定所有奖牌获得国家的人均奖牌获得倾向相同。排名指数 U 采用简单的二叉和定义。在此,我们将对这一方法进行解释,并介绍最近三届夏季奥运会(2012 年伦敦奥运会、2016 年里约奥运会和 2021 年东京奥运会)的人口调整后国家排名。如果报道奥运会的体育媒体和感兴趣的公众了解了这种排名方法的优势,那么它就可以与原始奖牌数一起被广泛报道,从而为奥运会增添精彩和趣味。
{"title":"Population-adjusted national rankings in the Olympics","authors":"Robert C. Duncan, Andrew Parece","doi":"10.3233/jsa-240874","DOIUrl":"https://doi.org/10.3233/jsa-240874","url":null,"abstract":"Ranking countries in the Olympic Games by medal counts clearly favors large-population countries over small ones, while ranking by medals-per-capita produces national rankings with very small population countries on top. We discuss why this happens, and propose a new national ranking system for the Olympics, also based upon medals won, which is inclusive in the sense that countries of widely-varying population can achieve high rankings. This population-adjusted probability ranking ranks countries by how much evidence they show for high capability at Olympic sports. In particular, it ranks countries according to how improbable their medal counts would be in an idealized reference model of the Games which posits that all medal-winning nations have equal propensity per capita for winning medals. The ranking index U is defined using a simple binomial sum. Here we explain the method, and we present population-adjusted national rankings for the last three summer Olympics (London 2012, Rio 2016 and Tokyo 2020, held in 2021). If the advantages of this ranking method come to be understood by sports media covering the Olympics and by the interested public, it could be widely reported alongside raw medal counts, thus adding excitement and interest to the Olympics.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141654384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sabermetrics by the sea: Evaluating college players with the Cape Cod Baseball League 海边的计量经济学通过鳕鱼角棒球联盟评估大学球员
IF 1.1 Pub Date : 2024-04-22 DOI: 10.3233/jsa-240771
Humbert Kilanowski, Thomas Moloney
From the dawn of the “Moneyball” system of searching for players with undervalued skills, an increasing proportion of players chosen in the Major League draft has come from the collegiate ranks, and while every professional team has an analytics department, the draft remains the last frontier for identifying and acquiring the best prospective players. Thus, it has become more important in recent years to evaluate college players properly, and while players’ statistics during the college season can vary wildly due to differing levels of competition, it is necessary to find a more objective metric for measuring college players’ skills. We propose that the most effective metric for doing so comes from observing players’ performances during the summer, when the variable of strength of schedule can be directly controlled, as players of the same skill level compete against each other. Our study focuses on the Cape Cod Baseball League (CCBL), a prestigious summer league that attracts the most talented college players, from which many players are drafted into the Majors every year. Our reasons for choosing the CCBL are the aforementioned homogeneity of talent; the lack of effects of travel fatigue, as the teams all play in a concentrated geographical area; and the league’s built-in replacement level, as temporary players often fill roster spots for players who had been selected the previous autumn, but whose college teams have advanced to the College World Series or who play on a national team during part of the CCBL season. This replacement level is used to calculate a metric of Wins Above Replacement, which we call cWAR.1
从 "钱球 "系统开始寻找具有被低估技能的球员以来,越来越多的大联盟选秀中被选中的球员来自大学联赛,虽然每支职业球队都有一个分析部门,但选秀仍然是发现和获得最佳未来球员的最后前沿。因此,近年来对大学球员进行正确评估变得越来越重要。虽然由于竞争水平不同,球员在大学赛季中的统计数据可能会有很大差异,但有必要找到一种更客观的衡量标准来衡量大学球员的技能。我们建议,最有效的衡量标准是观察球员在夏季的表现,因为在夏季,相同技术水平的球员相互竞争,赛程强度这一变量可以直接得到控制。我们的研究重点是鳕鱼角棒球联赛(CCBL),这是一个著名的夏季联赛,吸引了最有天赋的大学生球员,每年都有许多球员从中被选入大联盟。我们选择 CCBL 的原因是:前文提到的人才同质性;由于球队都在一个集中的地理区域内比赛,因此没有旅途疲劳的影响;以及联赛内置的替补水平,因为临时球员经常填补前一年秋天被选中的球员的名单位置,但这些球员的大学球队已经晋级大学世界系列赛,或者在 CCBL 赛季的部分时间里在国家队打球。这一替代水平用于计算 "替代胜场 "指标,我们称之为 cWAR。
{"title":"Sabermetrics by the sea: Evaluating college players with the Cape Cod Baseball League","authors":"Humbert Kilanowski, Thomas Moloney","doi":"10.3233/jsa-240771","DOIUrl":"https://doi.org/10.3233/jsa-240771","url":null,"abstract":"From the dawn of the “Moneyball” system of searching for players with undervalued skills, an increasing proportion of players chosen in the Major League draft has come from the collegiate ranks, and while every professional team has an analytics department, the draft remains the last frontier for identifying and acquiring the best prospective players. Thus, it has become more important in recent years to evaluate college players properly, and while players’ statistics during the college season can vary wildly due to differing levels of competition, it is necessary to find a more objective metric for measuring college players’ skills. We propose that the most effective metric for doing so comes from observing players’ performances during the summer, when the variable of strength of schedule can be directly controlled, as players of the same skill level compete against each other. Our study focuses on the Cape Cod Baseball League (CCBL), a prestigious summer league that attracts the most talented college players, from which many players are drafted into the Majors every year. Our reasons for choosing the CCBL are the aforementioned homogeneity of talent; the lack of effects of travel fatigue, as the teams all play in a concentrated geographical area; and the league’s built-in replacement level, as temporary players often fill roster spots for players who had been selected the previous autumn, but whose college teams have advanced to the College World Series or who play on a national team during part of the CCBL season. This replacement level is used to calculate a metric of Wins Above Replacement, which we call cWAR.1","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140672830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-season machine learning approach to examine the training load and injury relationship in professional soccer 采用多赛季机器学习方法研究职业足球训练负荷与受伤之间的关系
IF 1.1 Pub Date : 2024-04-22 DOI: 10.3233/jsa-240718
Aritra Majumdar, Rashid Bakirov, Dan Hodges, Sean McCullagh, Tim Rees
OBJECTIVES: The purpose of this study was to use machine learning to examine the relationship between training load and soccer injury with a multi-season dataset from one English Premier League club. METHODS: Participants were 35 male professional soccer players (aged 25.79±3.75 years, range 18–37 years; height 1.80±0.07 m, range 1.63–1.95 m; weight 80.70±6.78 kg, range 66.03–93.70 kg), with data collected from the 2014–2015 season until the 2018–2019 season. A total of 106 training loads variables (40 GPS data, 6 personal information, 14 physical data, 4 psychological data and 14 ACWR, 14 MSWR and 14 EWMA data) were examined in relation to 133 non-contact injuries, with a high imbalance ratio of 0.013. RESULTS: XGBoost and Artificial Neural Network were implemented to train the machine learning models using four and a half seasons’ data, with the developed models subsequently tested on the following half season’s data. During the first four and a half seasons, there were 341 injuries; during the next half season there were 37 injuries. To interpret and visualize the output of each model and the contribution of each feature (i.e., training load) towards the model, we used the Shapley Additive Explanations (SHAP) approach. Of 37 injuries, XGBoost correctly predicted 26 injuries, with recall and precision of 73% and 10% respectively. Artificial Neural Network correctly predicted 28 injuries, with recall and precision of 77% and 13% respectively. In the model using Artificial Neural Network (the relatively more accurate model), last injury area and weight appeared to be the most important features contributing to the prediction of injury. CONCLUSIONS: This was the first study of its kind to use Artificial Neural Network and a multi-season dataset for injury prediction. Our results demonstrate the potential to predict injuries with high recall, thereby identifying most of the injury cases, albeit, due to high class imbalance, precision suffered. This approach to using machine learning provides potentially valuable insights for soccer organizations and practitioners when monitoring load injuries.
研究目的本研究的目的是利用机器学习技术,通过一个英超俱乐部的多赛季数据集来研究训练负荷与足球损伤之间的关系。方法:参与者为 35 名男性职业足球运动员(年龄为 25.79±3.75岁,范围为 18-37 岁;身高为 1.80±0.07米,范围为 1.63-1.95 米;体重为 80.70±6.78公斤,范围为 66.03-93.70 公斤),数据收集时间为 2014-2015 赛季至 2018-2019 赛季。共研究了106个训练负荷变量(40个GPS数据、6个个人信息、14个体能数据、4个心理数据和14个ACWR、14个MSWR和14个EWMA数据)与133次非接触性损伤的关系,不平衡比高达0.013。结果:使用 XGBoost 和人工神经网络对四个半赛季的数据进行了机器学习模型的训练,随后在接下来的半个赛季的数据中对所开发的模型进行了测试。在前四个半赛季中,共有 341 人受伤;在后半个赛季中,共有 37 人受伤。为了解释和直观显示每个模型的输出结果以及每个特征(即训练负荷)对模型的贡献,我们使用了夏普利加法解释(SHAP)方法。在 37 例伤害中,XGBoost 正确预测了 26 例伤害,召回率和精确率分别为 73% 和 10%。人工神经网络正确预测了 28 起伤害事故,召回率和精确率分别为 77% 和 13%。在使用人工神经网络的模型(相对更准确的模型)中,最后的损伤面积和重量似乎是预测损伤的最重要特征。结论:这是首次使用人工神经网络和多赛季数据集进行损伤预测的研究。我们的研究结果表明,尽管由于类的高度不平衡,精确度受到了影响,但仍有可能以高召回率预测受伤情况,从而识别出大多数受伤病例。这种使用机器学习的方法为足球组织和从业人员监测负荷伤害提供了潜在的宝贵见解。
{"title":"A multi-season machine learning approach to examine the training load and injury relationship in professional soccer","authors":"Aritra Majumdar, Rashid Bakirov, Dan Hodges, Sean McCullagh, Tim Rees","doi":"10.3233/jsa-240718","DOIUrl":"https://doi.org/10.3233/jsa-240718","url":null,"abstract":"OBJECTIVES: The purpose of this study was to use machine learning to examine the relationship between training load and soccer injury with a multi-season dataset from one English Premier League club. METHODS: Participants were 35 male professional soccer players (aged 25.79±3.75 years, range 18–37 years; height 1.80±0.07 m, range 1.63–1.95 m; weight 80.70±6.78 kg, range 66.03–93.70 kg), with data collected from the 2014–2015 season until the 2018–2019 season. A total of 106 training loads variables (40 GPS data, 6 personal information, 14 physical data, 4 psychological data and 14 ACWR, 14 MSWR and 14 EWMA data) were examined in relation to 133 non-contact injuries, with a high imbalance ratio of 0.013. RESULTS: XGBoost and Artificial Neural Network were implemented to train the machine learning models using four and a half seasons’ data, with the developed models subsequently tested on the following half season’s data. During the first four and a half seasons, there were 341 injuries; during the next half season there were 37 injuries. To interpret and visualize the output of each model and the contribution of each feature (i.e., training load) towards the model, we used the Shapley Additive Explanations (SHAP) approach. Of 37 injuries, XGBoost correctly predicted 26 injuries, with recall and precision of 73% and 10% respectively. Artificial Neural Network correctly predicted 28 injuries, with recall and precision of 77% and 13% respectively. In the model using Artificial Neural Network (the relatively more accurate model), last injury area and weight appeared to be the most important features contributing to the prediction of injury. CONCLUSIONS: This was the first study of its kind to use Artificial Neural Network and a multi-season dataset for injury prediction. Our results demonstrate the potential to predict injuries with high recall, thereby identifying most of the injury cases, albeit, due to high class imbalance, precision suffered. This approach to using machine learning provides potentially valuable insights for soccer organizations and practitioners when monitoring load injuries.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140673244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance analysis in top handball matches in the seasons before, during, and after the COVID-19 pandemic COVID-19 大流行之前、期间和之后各赛季顶级手球比赛的表现分析
IF 1.1 Pub Date : 2024-03-01 DOI: 10.3233/jsa-240769
Paweł Krawczyk, Mateusz Szczerba, Jan Labiński, Maksymilian Smoliński
The aim of the study was to determine whether there are differences in performance analysis in handball between Pre-COVID-19, during COVID-19, and Post-COVID-19 seasons. The study material was obtained from the official match statistics of PGNiG Super league Ltd. Matches were played in the 2019/2020 season before COVID-19, 2020/2021 during COVID-19, and 2021/2022 Post-COVID-19. The Mann-Whitney U test was used for comparisons between two groups, for three groups using the Kruskal-Wallis test. In Pre-COVID-19 season, players made an average of 1.3 more 9 meter throws the Post-COVID-19. Post-COVID-19 season is characterized by a higher 6 meter goals and 6 meter throw count with respect to the Pre-COVID-19. The results show a higher goalkeeper 7 meter throw effectiveness in Pre-COVID-19 season than in COVID-19. The increasing number of throws and goals from the 6th meter along with a decrease in the number of throws from the 9th meter indicates the latest trends in handball. A reduction in the number of offensive fouls and an increase in the number of fast attacks and the effectiveness of goalkeepers’ interventions from 7 meters in the second round of the COVID-19 season indicates the adaptation of players to the new conditions created by the pandemic.
本研究旨在确定在 COVID-19 前、COVID-19 期间和 COVID-19 后三个赛季之间,手球运动的成绩分析是否存在差异。研究材料来自 PGNiG 超级联赛有限公司的官方比赛统计数据。比赛分别在 COVID-19 之前的 2019/2020 赛季、COVID-19 期间的 2020/2021 赛季和 COVID-19 之后的 2021/2022 赛季进行。两组之间的比较采用 Mann-Whitney U 检验,三组之间的比较采用 Kruskal-Wallis 检验。在前 COVID-19 赛季,球员的平均 9 米投掷次数比后 COVID-19 赛季多 1.3 次。后 COVID-19 赛季与前 COVID-19 赛季相比,6 米进球数和 6 米投掷数更高。结果显示,COVID-19 赛季前的门将 7 米投掷效率高于 COVID-19 赛季。从 6 米开始的投掷和进球数量增加,而从 9 米开始的投掷数量减少,这表明了手球运动的最新趋势。在 COVID-19 赛季第二轮比赛中,进攻犯规次数减少,快攻次数增加,守门员从 7 米处介入的有效性提高,这表明球员适应了大流行病创造的新条件。
{"title":"Performance analysis in top handball matches in the seasons before, during, and after the COVID-19 pandemic","authors":"Paweł Krawczyk, Mateusz Szczerba, Jan Labiński, Maksymilian Smoliński","doi":"10.3233/jsa-240769","DOIUrl":"https://doi.org/10.3233/jsa-240769","url":null,"abstract":"The aim of the study was to determine whether there are differences in performance analysis in handball between Pre-COVID-19, during COVID-19, and Post-COVID-19 seasons. The study material was obtained from the official match statistics of PGNiG Super league Ltd. Matches were played in the 2019/2020 season before COVID-19, 2020/2021 during COVID-19, and 2021/2022 Post-COVID-19. The Mann-Whitney U test was used for comparisons between two groups, for three groups using the Kruskal-Wallis test. In Pre-COVID-19 season, players made an average of 1.3 more 9 meter throws the Post-COVID-19. Post-COVID-19 season is characterized by a higher 6 meter goals and 6 meter throw count with respect to the Pre-COVID-19. The results show a higher goalkeeper 7 meter throw effectiveness in Pre-COVID-19 season than in COVID-19. The increasing number of throws and goals from the 6th meter along with a decrease in the number of throws from the 9th meter indicates the latest trends in handball. A reduction in the number of offensive fouls and an increase in the number of fast attacks and the effectiveness of goalkeepers’ interventions from 7 meters in the second round of the COVID-19 season indicates the adaptation of players to the new conditions created by the pandemic.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140268477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and prediction of tennis matches at Grand Slam tournaments 大满贯赛事网球比赛的建模和预测
IF 1.1 Pub Date : 2024-02-29 DOI: 10.3233/jsa-240670
N. Buhamra, A. Groll, S. Brunner
In this manuscript, different approaches for modeling and prediction of tennis matches in Grand Slam tournaments are proposed. The data used here contain information on 5,013 matches in men’s Grand Slam tournaments from the years 2011–2022. All regarded approaches are based on regression models, modeling the probability of the first-named player winning. Several potential covariates are considered including the players’ age, the ATP ranking and points, odds, elo rating as well as two additional age variables, which take into account that the optimal age of a tennis player is between 28 and 32 years. We compare the different regression model approaches with respect to three performance measures, namely classification rate, predictive Bernoulli likelihood, and Brier score in a 43-fold cross-validation-type approach for the matches of the years 2011 to 2021. The top five optimal models with highest average ranks are then selected. In order to predict and compare the results of the tournaments in 2022 with the actual results, a comparison over a continuously updating data set via a “rolling window” strategy is used. Also, again the previously mentioned performance measures are calculated. Additionally, we examine whether the assumption of non-linear effects or additional court- and player-specific abilities is reasonable.
本手稿提出了对大满贯赛事中的网球比赛进行建模和预测的不同方法。本文使用的数据包含 2011-2022 年男子大满贯赛事中 5013 场比赛的信息。所有考虑到的方法都基于回归模型,对第一名选手获胜的概率进行建模。我们考虑了几个潜在的协变量,包括球员年龄、ATP 排名和积分、赔率、elo 评分以及两个额外的年龄变量,其中考虑到网球运动员的最佳年龄在 28 岁至 32 岁之间。我们针对 2011 年至 2021 年的比赛,采用 43 倍交叉验证的方法,比较了不同回归模型方法的三个性能指标,即分类率、预测伯努利可能性和布赖尔得分。然后选出平均排名最高的前五个最优模型。为了预测 2022 年的比赛结果并将其与实际结果进行比较,采用了 "滚动窗口 "策略,对不断更新的数据集进行比较。同时,再次计算之前提到的性能指标。此外,我们还研究了非线性效应或其他特定球场和球员能力的假设是否合理。
{"title":"Modeling and prediction of tennis matches at Grand Slam tournaments","authors":"N. Buhamra, A. Groll, S. Brunner","doi":"10.3233/jsa-240670","DOIUrl":"https://doi.org/10.3233/jsa-240670","url":null,"abstract":"In this manuscript, different approaches for modeling and prediction of tennis matches in Grand Slam tournaments are proposed. The data used here contain information on 5,013 matches in men’s Grand Slam tournaments from the years 2011–2022. All regarded approaches are based on regression models, modeling the probability of the first-named player winning. Several potential covariates are considered including the players’ age, the ATP ranking and points, odds, elo rating as well as two additional age variables, which take into account that the optimal age of a tennis player is between 28 and 32 years. We compare the different regression model approaches with respect to three performance measures, namely classification rate, predictive Bernoulli likelihood, and Brier score in a 43-fold cross-validation-type approach for the matches of the years 2011 to 2021. The top five optimal models with highest average ranks are then selected. In order to predict and compare the results of the tournaments in 2022 with the actual results, a comparison over a continuously updating data set via a “rolling window” strategy is used. Also, again the previously mentioned performance measures are calculated. Additionally, we examine whether the assumption of non-linear effects or additional court- and player-specific abilities is reasonable.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140408745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Factors associated with match outcomes in elite European football – insights from machine learning models 与欧洲精英足球比赛结果相关的因素--机器学习模型的启示
IF 1.1 Pub Date : 2024-02-27 DOI: 10.3233/jsa-240745
Maxime Settembre, Martin Buchheit, K. Hader, Ray Hamill, Adrien Tarascon, Raymond Verheijen, Derek McHugh
AIM To examine the factors affecting European Football match outcomes using machine learning models. METHODS Fixtures of 269 teams competing in the top seven European leagues were extracted (2001/02 to 2021/22, total >61,000 fixtures). We used eXtreme Gradient Boosting (XGBoost) to assess the relationship between result (win, draw, loss) and the explanatory variables. RESULTS The top contributors to match outcomes were travel distance, between-team differences in Elo (with a contribution magnitude to the model half of that of travel distance and match location), and recent domestic performance (with a contribution magnitude of a fourth to a third of that of travel distance and match location), irrespective of the dataset and context analyzed. Contextual factors such as rest days between matches, the number of matches since the managers have been in charge, and match-to-match player rotations were also shown to influence match outcomes; however, their contribution magnitude was consistently 4–8 times smaller than that of the three main contributors mentioned above. CONCLUSIONS Machine learning has proven to provide insightful results for coaches and supporting staff who may use their results to set expectations and adjust their practices in relation to the different contexts examined here.
目的 利用机器学习模型研究影响欧洲足球比赛结果的因素。方法 提取欧洲七大联赛 269 支球队的比赛录像(2001/02 至 2021/22,总计超过 61,000 场)。我们使用梯度提升法(XGBoost)来评估结果(胜、平、负)与解释变量之间的关系。结果 对比赛结果影响最大的因素是旅行距离、球队之间的 Elo 差异(对模型的贡献程度是旅行距离和比赛地点的一半)以及近期的国内表现(对模型的贡献程度是旅行距离和比赛地点的四分之一到三分之一),与分析的数据集和背景无关。比赛之间的休息日、主教练执教以来的比赛场次以及比赛间的球员轮换等背景因素也被证明会影响比赛结果;但是,它们的贡献率始终比上述三个主要因素小 4-8 倍。结论 事实证明,机器学习为教练和辅助人员提供了有洞察力的结果,他们可以利用这些结果设定期望值,并根据本文研究的不同情况调整自己的做法。
{"title":"Factors associated with match outcomes in elite European football – insights from machine learning models","authors":"Maxime Settembre, Martin Buchheit, K. Hader, Ray Hamill, Adrien Tarascon, Raymond Verheijen, Derek McHugh","doi":"10.3233/jsa-240745","DOIUrl":"https://doi.org/10.3233/jsa-240745","url":null,"abstract":"AIM To examine the factors affecting European Football match outcomes using machine learning models. METHODS Fixtures of 269 teams competing in the top seven European leagues were extracted (2001/02 to 2021/22, total >61,000 fixtures). We used eXtreme Gradient Boosting (XGBoost) to assess the relationship between result (win, draw, loss) and the explanatory variables. RESULTS The top contributors to match outcomes were travel distance, between-team differences in Elo (with a contribution magnitude to the model half of that of travel distance and match location), and recent domestic performance (with a contribution magnitude of a fourth to a third of that of travel distance and match location), irrespective of the dataset and context analyzed. Contextual factors such as rest days between matches, the number of matches since the managers have been in charge, and match-to-match player rotations were also shown to influence match outcomes; however, their contribution magnitude was consistently 4–8 times smaller than that of the three main contributors mentioned above. CONCLUSIONS Machine learning has proven to provide insightful results for coaches and supporting staff who may use their results to set expectations and adjust their practices in relation to the different contexts examined here.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140425891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding repeatable progressive pass clusters and application in international football 寻找可重复的渐进式传球群并应用于国际足球
IF 1.1 Pub Date : 2024-01-22 DOI: 10.3233/jsa-220732
Bikash Deb, Javier Fernández Navarro, A. McRobert, Ian Jarman
Progressive passing in football (soccer) is a key aspect in creating positive possession outcomes. Whilst this is well established, there is not a consistent way to describe the different types of progressive passes. We expand on the previous literature, providing a complete methodological approach to progressive pass clustering from selection of the number of clusters (k) to risk-reward profiling of these progressive pass types. In this paper the Separation and Concordance (SeCo) framework is utilised to provide a process to analyse k-means clustering solutions in a more repeatable way. The results demonstrate that we can find stable progressive pass clusters in International Football and their efficacy with progressive passes “Mid Central to Mid Half Space” in build-up and “Mid Half Space to Final Central” into the final 3rd having the best balance between risk (turnover) and reward (shot created) in the subsequent possession. This allowed for opposition profiling of player and team patterns in different phases of play, with a case study presented for the teams in the Last 16 of the 2022 World Cup.
在足球运动中,渐进式传球是创造积极控球结果的一个关键环节。虽然这一点已经得到公认,但并没有一种一致的方法来描述不同类型的渐进式传球。我们对之前的文献进行了扩展,为渐进式传球聚类提供了一套完整的方法论,从聚类数量(k)的选择到这些渐进式传球类型的风险回报分析。本文利用分离与一致性(SeCo)框架,提供了一种以更可重复的方式分析 k 均值聚类解决方案的方法。结果表明,我们可以在国际足球比赛中找到稳定的渐进式传球聚类及其功效,其中 "中场中央到中场半空 "的渐进式传球和 "中场半空到终场中央 "的渐进式传球在随后的控球中风险(翻盘)和回报(创造射门机会)之间达到了最佳平衡。这样就可以对不同比赛阶段的球员和球队模式进行分析,并对 2022 年世界杯 16 强球队进行了案例研究。
{"title":"Finding repeatable progressive pass clusters and application in international football","authors":"Bikash Deb, Javier Fernández Navarro, A. McRobert, Ian Jarman","doi":"10.3233/jsa-220732","DOIUrl":"https://doi.org/10.3233/jsa-220732","url":null,"abstract":"Progressive passing in football (soccer) is a key aspect in creating positive possession outcomes. Whilst this is well established, there is not a consistent way to describe the different types of progressive passes. We expand on the previous literature, providing a complete methodological approach to progressive pass clustering from selection of the number of clusters (k) to risk-reward profiling of these progressive pass types. In this paper the Separation and Concordance (SeCo) framework is utilised to provide a process to analyse k-means clustering solutions in a more repeatable way. The results demonstrate that we can find stable progressive pass clusters in International Football and their efficacy with progressive passes “Mid Central to Mid Half Space” in build-up and “Mid Half Space to Final Central” into the final 3rd having the best balance between risk (turnover) and reward (shot created) in the subsequent possession. This allowed for opposition profiling of player and team patterns in different phases of play, with a case study presented for the teams in the Last 16 of the 2022 World Cup.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139608008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Winner prediction in an ongoing one day international cricket match 正在进行的国际板球一日赛胜负预测
IF 1.1 Pub Date : 2024-01-09 DOI: 10.3233/jsa-220735
Yash Agrawal, Kundan Kandhway
Cricket is a team sport with an intricate set of rules, where players specialize in multiple skills such as batting, bowling, and fielding. Playing conditions and home advantage also impact the game. Thus, it is quite challenging to build an accurate quantitative model for the game. In this paper, we provide a data driven approach to predict the winner of a cricket match. We divide the ongoing match into various states and provide a prediction for each state using supervised machine learning models. We employ dynamic features that account for the current match situation, together with the static features like team strength, winner of the toss, and the home advantage. We also use SHAP scores—an explainable AI technique—to interpret the proposed prediction model. We use ball-by-ball data from 1359 men’s one day international cricket matches played between January 2004 to January 2022 to present our results. We achieved the best in-play prediction accuracy of about 85% . SHAP scores reveal that during initial phases of the match, the model treats static features like team strength more important than others, in making the predictions. But as the match progresses, dynamic features capturing the current match situation become exceedingly important. Our work may be useful in preparing tools for in-play winner prediction for live cricket matches that can be used in websites and mobile applications covering the sport, in providing analytics during live television commentary, and in legal betting platforms.
板球是一项团队运动,有一套复杂的规则,球员擅长多种技能,如击球、保龄球和出界。比赛条件和主场优势也会对比赛产生影响。因此,为比赛建立一个精确的量化模型是一项相当具有挑战性的工作。在本文中,我们提供了一种数据驱动的方法来预测板球比赛的胜负。我们将正在进行的比赛分为不同的状态,并使用有监督的机器学习模型对每种状态进行预测。我们采用了能反映当前比赛形势的动态特征,以及球队实力、掷球胜者和主场优势等静态特征。我们还使用 SHAP 分数--一种可解释的人工智能技术--来解释所提出的预测模型。我们使用 2004 年 1 月至 2022 年 1 月期间举行的 1359 场男子国际板球一日赛的逐球数据来展示我们的结果。我们取得了约 85% 的最佳赛中预测准确率。SHAP 评分显示,在比赛的初始阶段,模型在进行预测时会将球队实力等静态特征看得比其他特征更重要。但随着比赛的进行,捕捉当前比赛形势的动态特征变得异常重要。我们的工作可能有助于为板球比赛现场胜负预测准备工具,这些工具可用于报道这项运动的网站和移动应用程序、在电视直播评论中提供分析以及合法投注平台。
{"title":"Winner prediction in an ongoing one day international cricket match","authors":"Yash Agrawal, Kundan Kandhway","doi":"10.3233/jsa-220735","DOIUrl":"https://doi.org/10.3233/jsa-220735","url":null,"abstract":"Cricket is a team sport with an intricate set of rules, where players specialize in multiple skills such as batting, bowling, and fielding. Playing conditions and home advantage also impact the game. Thus, it is quite challenging to build an accurate quantitative model for the game. In this paper, we provide a data driven approach to predict the winner of a cricket match. We divide the ongoing match into various states and provide a prediction for each state using supervised machine learning models. We employ dynamic features that account for the current match situation, together with the static features like team strength, winner of the toss, and the home advantage. We also use SHAP scores—an explainable AI technique—to interpret the proposed prediction model. We use ball-by-ball data from 1359 men’s one day international cricket matches played between January 2004 to January 2022 to present our results. We achieved the best in-play prediction accuracy of about 85% . SHAP scores reveal that during initial phases of the match, the model treats static features like team strength more important than others, in making the predictions. But as the match progresses, dynamic features capturing the current match situation become exceedingly important. Our work may be useful in preparing tools for in-play winner prediction for live cricket matches that can be used in websites and mobile applications covering the sport, in providing analytics during live television commentary, and in legal betting platforms.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139535056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative analysis of professional basketball: A qualitative discussion 职业篮球的定量分析:定性讨论
IF 1.1 Pub Date : 2024-01-08 DOI: 10.3233/jsa-220713
Yukun Zhou, Tianyi Li
Quantitative analysis of professional basketball become an attractive field for experienced data analysts, and the recent availability of high-resolution datasets pushes data-driven basketball analytics to a higher degree. We present a qualitative discussion on quantitative professional basketball. We propose and discuss the dimensions, the levels of granularity, and the types of tasks in quantitative basketball. We review key literature in the past two decades and map them into the proposed qualitative framework, with an evolutionary perspective and an emphasis on recent advances. A list of questions around professional basketball that could be approached with quantitative tools is displayed, pointing to directions for future research. We touch on the new landscapes of virtual basketball at enriching the space for quantitative analysis. This report serves as a qualitative primer for quantitative analysis of professional basketball, exhibiting the growing prospect of the promising research area.
对于经验丰富的数据分析师来说,职业篮球的定量分析已成为一个颇具吸引力的领域,而近年来高分辨率数据集的出现则将数据驱动的篮球分析推向了更高的层次。我们对职业篮球的定量分析进行了定性讨论。我们提出并讨论了定量篮球的维度、粒度级别和任务类型。我们回顾了过去二十年的重要文献,并将其映射到所提出的定性框架中,以演进的视角并强调近期的进展。我们列出了可以用定量工具解决的职业篮球相关问题,为未来研究指明了方向。我们探讨了虚拟篮球的新格局,以丰富定量分析的空间。本报告是职业篮球定量分析的定性入门指南,展示了这一前景广阔的研究领域日益增长的前景。
{"title":"Quantitative analysis of professional basketball: A qualitative discussion","authors":"Yukun Zhou, Tianyi Li","doi":"10.3233/jsa-220713","DOIUrl":"https://doi.org/10.3233/jsa-220713","url":null,"abstract":"Quantitative analysis of professional basketball become an attractive field for experienced data analysts, and the recent availability of high-resolution datasets pushes data-driven basketball analytics to a higher degree. We present a qualitative discussion on quantitative professional basketball. We propose and discuss the dimensions, the levels of granularity, and the types of tasks in quantitative basketball. We review key literature in the past two decades and map them into the proposed qualitative framework, with an evolutionary perspective and an emphasis on recent advances. A list of questions around professional basketball that could be approached with quantitative tools is displayed, pointing to directions for future research. We touch on the new landscapes of virtual basketball at enriching the space for quantitative analysis. This report serves as a qualitative primer for quantitative analysis of professional basketball, exhibiting the growing prospect of the promising research area.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139535760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A goal-aligned coordinate system for invasion games 入侵游戏的目标对齐坐标系
IF 1.1 Pub Date : 2023-11-30 DOI: 10.3233/jsa-220706
Ulrik Brandes
Spatial locations of players and game devices are a fundamental data type in team-sports analytics. They are typically specified in Cartesian coordinates, but with varying conventions for the origin, orientation, and scaling. In invasion games such as football, basketball, or hockey, however, many markings are of fixed dimension even when the field of play is not, so that the game-specific meaning of locations does not scale uniformly. We propose an alternative coordinate system that accommodates variable field sizes by using the goals instead of a corner or the center of the field of play as frames of reference.
球员和游戏设备的空间位置是团队运动分析中的一种基本数据类型。它们通常以笛卡尔坐标指定,但在原点、方向和缩放比例方面有不同的约定。然而,在足球、篮球或曲棍球等入侵游戏中,即使比赛场地不是固定的,许多标记也是固定尺寸的,因此位置的特定游戏意义并不能统一缩放。我们提出了另一种坐标系,通过使用球门而不是比赛场地的角落或中心作为参照系,来适应多变的场地大小。
{"title":"A goal-aligned coordinate system for invasion games","authors":"Ulrik Brandes","doi":"10.3233/jsa-220706","DOIUrl":"https://doi.org/10.3233/jsa-220706","url":null,"abstract":"Spatial locations of players and game devices are a fundamental data type in team-sports analytics. They are typically specified in Cartesian coordinates, but with varying conventions for the origin, orientation, and scaling. In invasion games such as football, basketball, or hockey, however, many markings are of fixed dimension even when the field of play is not, so that the game-specific meaning of locations does not scale uniformly. We propose an alternative coordinate system that accommodates variable field sizes by using the goals instead of a corner or the center of the field of play as frames of reference.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139203485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Sports Analytics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1