A data-driven method for in-game decision making in MLB: when to pull a starting pitcher

Gartheeban Ganeshapillai, J. Guttag
{"title":"A data-driven method for in-game decision making in MLB: when to pull a starting pitcher","authors":"Gartheeban Ganeshapillai, J. Guttag","doi":"10.1145/2487575.2487660","DOIUrl":null,"url":null,"abstract":"Professional sports is a roughly $500 billion dollar industry that is increasingly data-driven. In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams. Specifically we show how to use regularized linear regression to learn pitcher-specific predictive models that can be used to help decide when a starting pitcher should be replaced. A key step in the process is our method of converting categorical variables (e.g., the venue in which a game is played) into continuous variables suitable for the regression. Another key step is dealing with situations in which there is an insufficient amount of data to compute measures such as the effectiveness of a pitcher against specific batters. For each season we trained on the first 80% of the games, and tested on the rest. The results suggest that using our model could have led to better decisions than those made by major league managers. Applying our model would have led to a different decision 48% of the time. For those games in which a manager left a pitcher in that our model would have removed, the pitcher ended up performing poorly 60% of the time.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Professional sports is a roughly $500 billion dollar industry that is increasingly data-driven. In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams. Specifically we show how to use regularized linear regression to learn pitcher-specific predictive models that can be used to help decide when a starting pitcher should be replaced. A key step in the process is our method of converting categorical variables (e.g., the venue in which a game is played) into continuous variables suitable for the regression. Another key step is dealing with situations in which there is an insufficient amount of data to compute measures such as the effectiveness of a pitcher against specific batters. For each season we trained on the first 80% of the games, and tested on the rest. The results suggest that using our model could have led to better decisions than those made by major league managers. Applying our model would have led to a different decision 48% of the time. For those games in which a manager left a pitcher in that our model would have removed, the pitcher ended up performing poorly 60% of the time.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MLB游戏内决策的数据驱动方法:何时启用首发投手
职业体育是一个价值约5000亿美元的产业,越来越多的数据驱动。在本文中,我们展示了如何应用机器学习来生成一个模型,该模型可以让职业棒球队的经理做出更好的场上决策。具体来说,我们展示了如何使用正则化线性回归来学习投手特定的预测模型,这些模型可以用来帮助决定何时应该替换首发投手。这个过程中的一个关键步骤是我们将分类变量(例如,进行游戏的场地)转换为适合回归的连续变量的方法。另一个关键步骤是处理数据量不足的情况,例如投手对特定击球手的有效性。每个赛季,我们在前80%的比赛中进行训练,然后在剩下的比赛中进行测试。结果表明,使用我们的模型可能会比大联盟经理做出更好的决策。应用我们的模型会在48%的情况下做出不同的决定。对于那些教练留下投手的比赛,我们的模型会将其剔除,投手在60%的时间里表现不佳。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A general bootstrap performance diagnostic Flexible and robust co-regularized multi-domain graph clustering Beyond myopic inference in big data pipelines Constrained stochastic gradient descent for large-scale least squares problem Inferring distant-time location in low-sampling-rate trajectories
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1