PGD: A Large-scale Professional Go Dataset for Data-driven Analytics

Yifan Gao
{"title":"PGD: A Large-scale Professional Go Dataset for Data-driven Analytics","authors":"Yifan Gao","doi":"10.1109/CoG51982.2022.9893704","DOIUrl":null,"url":null,"abstract":"Lee Sedol is on a winning streak–does this legend rise again after the competition with AlphaGo? Ke Jie is invincible in the world championship–can he still win the title this time? Go is one of the most popular board games in East Asia, with a stable professional sports system that has lasted for decades in China, Japan, and Korea. There are mature data-driven analysis technologies for many sports, such as soccer, basketball, and esports. However, developing such technology for Go remains nontrivial and challenging due to the lack of datasets, meta-information, and in-game statistics. This paper creates the Professional Go Dataset (PGD), containing 98,043 games played by 2,148 professional players from 1950 to 2021. After manual cleaning and labeling, we provide detailed meta-information for each player, game, and tournament. Moreover, the dataset includes analysis results for each move in the match evaluated by advanced AlphaZero-based AI. To establish a benchmark for PGD, we further analyze the data and extract meaningful in-game features based on prior knowledge related to Go that can indicate the game status. With the help of complete meta-information and constructed in-game features, our results prediction system achieves an accuracy of 75.30%, much higher than several state-of-the-art approaches (64%-65%). As far as we know, PGD is the first dataset for data-driven analytics in Go and even in board games. Beyond this promising result, we provide more examples of tasks that benefit from our dataset. The ultimate goal of this paper is to bridge this ancient game and the modern data science community. It will advance research on Go-related analytics to enhance the fan experience, help players improve their ability, and facilitate other promising aspects. The dataset will be made publicly available.","PeriodicalId":394281,"journal":{"name":"2022 IEEE Conference on Games (CoG)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Conference on Games (CoG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoG51982.2022.9893704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Lee Sedol is on a winning streak–does this legend rise again after the competition with AlphaGo? Ke Jie is invincible in the world championship–can he still win the title this time? Go is one of the most popular board games in East Asia, with a stable professional sports system that has lasted for decades in China, Japan, and Korea. There are mature data-driven analysis technologies for many sports, such as soccer, basketball, and esports. However, developing such technology for Go remains nontrivial and challenging due to the lack of datasets, meta-information, and in-game statistics. This paper creates the Professional Go Dataset (PGD), containing 98,043 games played by 2,148 professional players from 1950 to 2021. After manual cleaning and labeling, we provide detailed meta-information for each player, game, and tournament. Moreover, the dataset includes analysis results for each move in the match evaluated by advanced AlphaZero-based AI. To establish a benchmark for PGD, we further analyze the data and extract meaningful in-game features based on prior knowledge related to Go that can indicate the game status. With the help of complete meta-information and constructed in-game features, our results prediction system achieves an accuracy of 75.30%, much higher than several state-of-the-art approaches (64%-65%). As far as we know, PGD is the first dataset for data-driven analytics in Go and even in board games. Beyond this promising result, we provide more examples of tasks that benefit from our dataset. The ultimate goal of this paper is to bridge this ancient game and the modern data science community. It will advance research on Go-related analytics to enhance the fan experience, help players improve their ability, and facilitate other promising aspects. The dataset will be made publicly available.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PGD:用于数据驱动分析的大规模专业围棋数据集
李世石正处于连胜状态——在与AlphaGo的比赛之后,这个传奇会再次崛起吗?柯洁在世锦赛上所向无敌,他这次还能夺冠吗?围棋是东亚地区最受欢迎的棋类游戏之一,在中国、日本和韩国都有一个稳定的专业运动体系,已经持续了几十年。许多运动都有成熟的数据驱动分析技术,比如足球、篮球和电子竞技。然而,由于缺乏数据集、元信息和游戏内统计数据,为围棋开发这种技术仍然是非常有挑战性的。本文创建了职业围棋数据集(PGD),其中包含2148名职业棋手从1950年到2021年的98043场比赛。在手工清理和标记之后,我们提供每个球员、游戏和比赛的详细元信息。此外,该数据集还包括基于alphazero的高级人工智能对比赛中每一步棋的分析结果。为了建立PGD的基准,我们进一步分析数据,并基于与围棋相关的先验知识提取有意义的游戏内特征,这些特征可以指示游戏状态。在完整的元信息和构建的游戏内置功能的帮助下,我们的结果预测系统达到了75.30%的准确率,远远高于几种最先进的方法(64%-65%)。据我们所知,PGD是围棋甚至棋类游戏中第一个数据驱动分析的数据集。除了这个有希望的结果之外,我们还提供了更多从我们的数据集中受益的任务示例。本文的最终目标是将这个古老的游戏与现代数据科学社区连接起来。它将推进与围棋相关的分析研究,以增强粉丝体验,帮助玩家提高他们的能力,并促进其他有前途的方面。该数据集将向公众开放。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Adaptive Game Soundtrack Tempo Based on Players’ Actions A Viewpoint on Construction of Networked Model of Event-triggered Hybrid Dynamic Games Towards a Competitive 3-Player Mahjong AI using Deep Reinforcement Learning MiaoSuan Wargame: A Multi-Mode Integrated Platform for Imperfect Information Game Evaluating Navigation Behavior of Agents in Games using Non-Parametric Statistics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1