预测MLB常规赛的输赢结果-使用数据挖掘方法的比较研究

Soto Valero
{"title":"预测MLB常规赛的输赢结果-使用数据挖掘方法的比较研究","authors":"Soto Valero","doi":"10.1515/IJCSS-2016-0007","DOIUrl":null,"url":null,"abstract":"Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Recent studies concentrated on using and generating new statistics called sabermetrics in order to rank teams and players according to their perceived strengths and consequently applying these rankings to forecast specific games. In this paper, we employ sabermetrics statistics with the purpose of assessing the predictive capabilities of four data mining methods (classification and regression based) for predicting outcomes (win or loss) in MLB regular season games. Our model approach uses only past data when making a prediction, corresponding to ten years of publicly available data. We create a dataset with accumulative sabermetrics statistics for each MLB team during this period for which data contamination is not possible. The inherent difficulties of attempting this specific sports prediction are confirmed using two geometry or topology based measures of data complexity. Results reveal that the classification predictive scheme forecasts game outcomes better than regression scheme, and of the four data mining methods used, SVMs produce the best predictive results with a mean of nearly 60% prediction accuracy for each team. The evaluation of our model is performed using stratified 10-fold cross-validation.","PeriodicalId":38466,"journal":{"name":"International Journal of Computer Science in Sport","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/IJCSS-2016-0007","citationCount":"1","resultStr":"{\"title\":\"Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods\",\"authors\":\"Soto Valero\",\"doi\":\"10.1515/IJCSS-2016-0007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Recent studies concentrated on using and generating new statistics called sabermetrics in order to rank teams and players according to their perceived strengths and consequently applying these rankings to forecast specific games. In this paper, we employ sabermetrics statistics with the purpose of assessing the predictive capabilities of four data mining methods (classification and regression based) for predicting outcomes (win or loss) in MLB regular season games. Our model approach uses only past data when making a prediction, corresponding to ten years of publicly available data. We create a dataset with accumulative sabermetrics statistics for each MLB team during this period for which data contamination is not possible. The inherent difficulties of attempting this specific sports prediction are confirmed using two geometry or topology based measures of data complexity. Results reveal that the classification predictive scheme forecasts game outcomes better than regression scheme, and of the four data mining methods used, SVMs produce the best predictive results with a mean of nearly 60% prediction accuracy for each team. The evaluation of our model is performed using stratified 10-fold cross-validation.\",\"PeriodicalId\":38466,\"journal\":{\"name\":\"International Journal of Computer Science in Sport\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/IJCSS-2016-0007\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Science in Sport\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/IJCSS-2016-0007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Science in Sport","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/IJCSS-2016-0007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 1

摘要

棒球是一项充满统计数据的运动,预测一场特定的美国职业棒球大联盟(MLB)比赛的获胜者是一项有趣而富有挑战性的任务。到目前为止,还没有明确的公式来确定哪些因素会引导一支球队走向胜利,但通过对多年历史记录的分析,可以发现许多趋势。最近的研究集中在使用和生成新的统计数据(称为sabermetrics),以便根据球队和球员的感知优势对他们进行排名,并最终应用这些排名来预测特定的比赛。在本文中,我们采用sabermetrics统计,目的是评估四种数据挖掘方法(基于分类和回归)的预测能力,以预测MLB常规赛比赛的结果(赢或输)。我们的模型方法在进行预测时只使用过去的数据,对应于十年的公开数据。我们创建了一个数据集,其中包含这段时间内每个MLB球队的累积统计数据,其中数据污染是不可能的。使用两种基于数据复杂性的几何或拓扑度量来证实尝试这种特定运动预测的固有困难。结果表明,分类预测方案对比赛结果的预测优于回归方案,并且在所使用的四种数据挖掘方法中,支持向量机的预测结果最好,每个团队的平均预测准确率接近60%。我们的模型的评估是使用分层10倍交叉验证进行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods
Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Recent studies concentrated on using and generating new statistics called sabermetrics in order to rank teams and players according to their perceived strengths and consequently applying these rankings to forecast specific games. In this paper, we employ sabermetrics statistics with the purpose of assessing the predictive capabilities of four data mining methods (classification and regression based) for predicting outcomes (win or loss) in MLB regular season games. Our model approach uses only past data when making a prediction, corresponding to ten years of publicly available data. We create a dataset with accumulative sabermetrics statistics for each MLB team during this period for which data contamination is not possible. The inherent difficulties of attempting this specific sports prediction are confirmed using two geometry or topology based measures of data complexity. Results reveal that the classification predictive scheme forecasts game outcomes better than regression scheme, and of the four data mining methods used, SVMs produce the best predictive results with a mean of nearly 60% prediction accuracy for each team. The evaluation of our model is performed using stratified 10-fold cross-validation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Computer Science in Sport
International Journal of Computer Science in Sport Computer Science-Computer Science (all)
CiteScore
2.20
自引率
0.00%
发文量
4
审稿时长
12 weeks
期刊最新文献
Automatic Detection of Faults in Simulated Race Walking from a Fixed Smartphone Camera Spin measurement system for table tennis balls based on asynchronous non-high-speed cameras The Use of Momentum-Inspired Features in Pre-Game Prediction Models for the Sport of Ice Hockey Hierarchical Bayesian analysis of racehorse running ability and jockey skills Workload Monitoring Tools in Field-Based Team Sports, the Emerging Technology and Analytics used for Performance and Injury Prediction: A Systematic Review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1