A data-driven method for in-game decision making in MLB: when to pull a starting pitcher

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI:10.1145/2487575.2487660

Gartheeban Ganeshapillai, J. Guttag

{"title":"A data-driven method for in-game decision making in MLB: when to pull a starting pitcher","authors":"Gartheeban Ganeshapillai, J. Guttag","doi":"10.1145/2487575.2487660","DOIUrl":null,"url":null,"abstract":"Professional sports is a roughly $500 billion dollar industry that is increasingly data-driven. In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams. Specifically we show how to use regularized linear regression to learn pitcher-specific predictive models that can be used to help decide when a starting pitcher should be replaced. A key step in the process is our method of converting categorical variables (e.g., the venue in which a game is played) into continuous variables suitable for the regression. Another key step is dealing with situations in which there is an insufficient amount of data to compute measures such as the effectiveness of a pitcher against specific batters. For each season we trained on the first 80% of the games, and tested on the rest. The results suggest that using our model could have led to better decisions than those made by major league managers. Applying our model would have led to a different decision 48% of the time. For those games in which a manager left a pitcher in that our model would have removed, the pitcher ended up performing poorly 60% of the time.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"53 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Professional sports is a roughly $500 billion dollar industry that is increasingly data-driven. In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams. Specifically we show how to use regularized linear regression to learn pitcher-specific predictive models that can be used to help decide when a starting pitcher should be replaced. A key step in the process is our method of converting categorical variables (e.g., the venue in which a game is played) into continuous variables suitable for the regression. Another key step is dealing with situations in which there is an insufficient amount of data to compute measures such as the effectiveness of a pitcher against specific batters. For each season we trained on the first 80% of the games, and tested on the rest. The results suggest that using our model could have led to better decisions than those made by major league managers. Applying our model would have led to a different decision 48% of the time. For those games in which a manager left a pitcher in that our model would have removed, the pitcher ended up performing poorly 60% of the time.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MLB游戏内决策的数据驱动方法:何时启用首发投手

职业体育是一个价值约5000亿美元的产业，越来越多的数据驱动。在本文中，我们展示了如何应用机器学习来生成一个模型，该模型可以让职业棒球队的经理做出更好的场上决策。具体来说，我们展示了如何使用正则化线性回归来学习投手特定的预测模型，这些模型可以用来帮助决定何时应该替换首发投手。这个过程中的一个关键步骤是我们将分类变量(例如，进行游戏的场地)转换为适合回归的连续变量的方法。另一个关键步骤是处理数据量不足的情况，例如投手对特定击球手的有效性。每个赛季，我们在前80%的比赛中进行训练，然后在剩下的比赛中进行测试。结果表明，使用我们的模型可能会比大联盟经理做出更好的决策。应用我们的模型会在48%的情况下做出不同的决定。对于那些教练留下投手的比赛，我们的模型会将其剔除，投手在60%的时间里表现不佳。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量

期刊最新文献

A general bootstrap performance diagnostic Flexible and robust co-regularized multi-domain graph clustering Beyond myopic inference in big data pipelines Constrained stochastic gradient descent for large-scale least squares problem Inferring distant-time location in low-sampling-rate trajectories