通过深度强化学习和信念蒙特卡洛搜索进行桥牌竞价

IF 19.2 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Ieee-Caa Journal of Automatica Sinica Pub Date : 2024-09-04 DOI:10.1109/JAS.2024.124488

Zizhang Qiu;Shouguang Wang;Dan You;MengChu Zhou

{"title":"通过深度强化学习和信念蒙特卡洛搜索进行桥牌竞价","authors":"Zizhang Qiu;Shouguang Wang;Dan You;MengChu Zhou","doi":"10.1109/JAS.2024.124488","DOIUrl":null,"url":null,"abstract":"Contract Bridge, a four-player imperfect information game, comprises two phases: bidding and playing. While computer programs excel at playing, bidding presents a challenging aspect due to the need for information exchange with partners and interference with communication of opponents. In this work, we introduce a Bridge bidding agent that combines supervised learning, deep reinforcement learning via self-play, and a test-time search approach. Our experiments demonstrate that our agent outperforms WBridge5, a highly regarded computer Bridge software that has won multiple world championships, by a performance of 0.98 IMPs (international match points) per deal over 10 000 deals, with a much cost-effective approach. The performance significantly surpasses previous state-of-the-art (0.85 IMPs per deal). Note 0.1 IMPs per deal is a significant improvement in Bridge bidding.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 10","pages":"2111-2122"},"PeriodicalIF":19.2000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridge Bidding via Deep Reinforcement Learning and Belief Monte Carlo Search\",\"authors\":\"Zizhang Qiu;Shouguang Wang;Dan You;MengChu Zhou\",\"doi\":\"10.1109/JAS.2024.124488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contract Bridge, a four-player imperfect information game, comprises two phases: bidding and playing. While computer programs excel at playing, bidding presents a challenging aspect due to the need for information exchange with partners and interference with communication of opponents. In this work, we introduce a Bridge bidding agent that combines supervised learning, deep reinforcement learning via self-play, and a test-time search approach. Our experiments demonstrate that our agent outperforms WBridge5, a highly regarded computer Bridge software that has won multiple world championships, by a performance of 0.98 IMPs (international match points) per deal over 10 000 deals, with a much cost-effective approach. The performance significantly surpasses previous state-of-the-art (0.85 IMPs per deal). Note 0.1 IMPs per deal is a significant improvement in Bridge bidding.\",\"PeriodicalId\":54230,\"journal\":{\"name\":\"Ieee-Caa Journal of Automatica Sinica\",\"volume\":\"11 10\",\"pages\":\"2111-2122\"},\"PeriodicalIF\":19.2000,\"publicationDate\":\"2024-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ieee-Caa Journal of Automatica Sinica\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10664606/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10664606/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

契约桥牌是一种四人不完全信息游戏，包括两个阶段：竞标和下注。虽然计算机程序擅长下棋，但由于需要与合作伙伴交换信息并干扰对手的交流，竞标是一个具有挑战性的方面。在这项工作中，我们介绍了一种桥牌竞标代理，它结合了监督学习、通过自我比赛进行的深度强化学习以及测试时间搜索方法。我们的实验证明，我们的代理在 10,000 次交易中，以每交易 0.98 IMPs（国际比赛积分）的成绩超越了 WBridge5（一款备受推崇的计算机桥牌软件，曾多次获得世界冠军），而且采用的方法更具成本效益。这一成绩大大超过了以前的先进水平（每盘 0.85 IMPs）。注意：每局 0.1 IMPs 是桥牌竞标中的一项重大改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Bridge Bidding via Deep Reinforcement Learning and Belief Monte Carlo Search

Contract Bridge, a four-player imperfect information game, comprises two phases: bidding and playing. While computer programs excel at playing, bidding presents a challenging aspect due to the need for information exchange with partners and interference with communication of opponents. In this work, we introduce a Bridge bidding agent that combines supervised learning, deep reinforcement learning via self-play, and a test-time search approach. Our experiments demonstrate that our agent outperforms WBridge5, a highly regarded computer Bridge software that has won multiple world championships, by a performance of 0.98 IMPs (international match points) per deal over 10 000 deals, with a much cost-effective approach. The performance significantly surpasses previous state-of-the-art (0.85 IMPs per deal). Note 0.1 IMPs per deal is a significant improvement in Bridge bidding.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering

CiteScore

23.50

自引率

11.00%

发文量

880

期刊介绍： The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.

期刊最新文献

Inside front cover Inside back cover Front cover Back cover From Shallow to Deep: A Novel Correlation Network Representation Regression Framework for Modeling and Monitoring MIQ-Driven Blast Furnace Ironmaking Processes