AAAI Workshop: Computer Poker and Imperfect Information最新文献

英文中文

Decision-Theoretic Clustering of Strategies 决策理论中的策略聚类

AAAI Workshop: Computer Poker and Imperfect Information

Pub Date : 2015-05-04 DOI: 10.5555/2772879.2772886

Nolan Bard, D. Nicholas, Csaba Szepesvari, Michael Bowling

Clustering agents by their behaviour can be crucial for building effective agent models. Traditional clustering typically aims to group entities together based on a distance metric, where a desirable clustering is one where the entities in a cluster are spatially close together. Instead, one may desire to cluster based on actionability, or the capacity for the clusters to suggest how an agent should respond to maximize their utility with respect to the entities. Segmentation problems examine this decision-theoretic clustering task. Although finding optimal solutions to these problems is computationally hard, greedy-based approximation algorithms exist. However, in settings where the agent has a combinatorially large number of candidate responses whose utilities must be considered, these algorithms are often intractable. In this work, we show that in many cases the utility function can be factored to allow for an efficient greedy algorithm even when there are exponentially large response spaces. We evaluate our technique theoretically, proving approximation bounds, and empirically using extensive-form games by clustering opponent strategies in toy poker games. Our results demonstrate that these techniques yield dramatically improved clusterings compared to a traditional distance-based clustering approach in terms of both subjective quality and utility obtained by responding to the clusters.

根据行为对代理进行聚类对于构建有效的代理模型至关重要。传统的聚类通常旨在基于距离度量将实体分组在一起，其中理想的聚类是集群中的实体在空间上靠近在一起。相反，人们可能希望基于可操作性或集群建议代理如何响应以最大化其相对于实体的效用的能力来进行集群。分割问题检验了这个决策理论聚类任务。虽然找到这些问题的最优解在计算上是困难的，但存在基于贪婪的近似算法。然而，在智能体有大量候选响应的情况下，这些算法通常是难以处理的。在这项工作中，我们表明，在许多情况下，即使存在指数级大的响应空间，效用函数也可以因式分解以允许有效的贪婪算法。我们从理论上评估我们的技术，证明近似界限，并通过在玩具扑克游戏中聚类对手策略来经验地使用广泛形式的游戏。我们的研究结果表明，与传统的基于距离的聚类方法相比，这些技术在主观质量和通过响应聚类获得的效用方面产生了显着改进的聚类。

{"title":"Decision-Theoretic Clustering of Strategies","authors":"Nolan Bard, D. Nicholas, Csaba Szepesvari, Michael Bowling","doi":"10.5555/2772879.2772886","DOIUrl":"https://doi.org/10.5555/2772879.2772886","url":null,"abstract":"Clustering agents by their behaviour can be crucial for building effective agent models. Traditional clustering typically aims to group entities together based on a distance metric, where a desirable clustering is one where the entities in a cluster are spatially close together. Instead, one may desire to cluster based on actionability, or the capacity for the clusters to suggest how an agent should respond to maximize their utility with respect to the entities. Segmentation problems examine this decision-theoretic clustering task. Although finding optimal solutions to these problems is computationally hard, greedy-based approximation algorithms exist. However, in settings where the agent has a combinatorially large number of candidate responses whose utilities must be considered, these algorithms are often intractable. In this work, we show that in many cases the utility function can be factored to allow for an efficient greedy algorithm even when there are exponentially large response spaces. We evaluate our technique theoretically, proving approximation bounds, and empirically using extensive-form games by clustering opponent strategies in toy poker games. Our results demonstrate that these techniques yield dramatically improved clusterings compared to a traditional distance-based clustering approach in terms of both subjective quality and utility obtained by responding to the clusters.","PeriodicalId":106568,"journal":{"name":"AAAI Workshop: Computer Poker and Imperfect Information","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121542143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent 分层抽象、分布式均衡计算及后处理，并应用于冠军无限德州扑克智能体

AAAI Workshop: Computer Poker and Imperfect Information

Pub Date : 2015-05-04 DOI: 10.5555/2772879.2772885

Noam Brown, Sam Ganzfried, T. Sandholm

The leading approach for solving large imperfect-information games is automated abstraction followed by running an equilibrium-finding algorithm. We introduce a distributed version of the most commonly used equilibrium-finding algorithm, counterfactual regret minimization (CFR), which enables CFR to scale to dramatically larger abstractions and numbers of cores. The new algorithm begets constraints on the abstraction so as to make the pieces running on different computers disjoint. We introduce an algorithm for generating such abstractions while capitalizing on state-of-the-art abstraction ideas such as imperfect recall and earth-mover's distance. Our techniques enabled an equilibrium computation of unprecedented size on a supercomputer with a high inter-blade memory latency. Prior approaches run slowly on this architecture. Our approach also leads to a significant improvement over using the prior best approach on a large shared-memory server with low memory latency. Finally, we introduce a family of post-processing techniques that outperform prior ones. We applied these techniques to generate an agent for two-player no-limit Texas Hold'em, called Tartanian7, that won the 2014 Annual Computer Poker Competition, beating each opponent with statistical significance.

解决大型不完全信息博弈的主要方法是自动抽象，然后运行均衡查找算法。我们介绍了最常用的平衡查找算法的分布式版本，反事实遗憾最小化(CFR)，它使CFR能够扩展到显着更大的抽象和核心数量。新算法对抽象产生约束，使得在不同计算机上运行的片段不相交。我们引入了一种算法来生成这样的抽象，同时利用了最先进的抽象思想，如不完全召回和推土机的距离。我们的技术在具有高刀片间内存延迟的超级计算机上实现了前所未有的平衡计算。先前的方法在这种体系结构上运行缓慢。与在具有低内存延迟的大型共享内存服务器上使用先前的最佳方法相比，我们的方法还带来了显著的改进。最后，我们介绍了一系列优于先前技术的后处理技术。我们应用这些技术生成了一个名为Tartanian7的双玩家无限制德州扑克代理，它赢得了2014年年度计算机扑克比赛，以统计显著性击败了每个对手。

{"title":"Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent","authors":"Noam Brown, Sam Ganzfried, T. Sandholm","doi":"10.5555/2772879.2772885","DOIUrl":"https://doi.org/10.5555/2772879.2772885","url":null,"abstract":"The leading approach for solving large imperfect-information games is automated abstraction followed by running an equilibrium-finding algorithm. We introduce a distributed version of the most commonly used equilibrium-finding algorithm, counterfactual regret minimization (CFR), which enables CFR to scale to dramatically larger abstractions and numbers of cores. The new algorithm begets constraints on the abstraction so as to make the pieces running on different computers disjoint. We introduce an algorithm for generating such abstractions while capitalizing on state-of-the-art abstraction ideas such as imperfect recall and earth-mover's distance. Our techniques enabled an equilibrium computation of unprecedented size on a supercomputer with a high inter-blade memory latency. Prior approaches run slowly on this architecture. Our approach also leads to a significant improvement over using the prior best approach on a large shared-memory server with low memory latency. Finally, we introduce a family of post-processing techniques that outperform prior ones. We applied these techniques to generate an agent for two-player no-limit Texas Hold'em, called Tartanian7, that won the 2014 Annual Computer Poker Competition, beating each opponent with statistical significance.","PeriodicalId":106568,"journal":{"name":"AAAI Workshop: Computer Poker and Imperfect Information","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 65

Solving Games with Functional Regret Estimation 用功能性后悔估计解决游戏

AAAI Workshop: Computer Poker and Imperfect Information

Pub Date : 2014-11-28 DOI: 10.1609/aaai.v29i1.9445

K. Waugh, Dustin Morrill, J. Bagnell, Michael Bowling

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work onabstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

我们提出了一种新的在线学习方法来最小化大型广泛形式游戏中的后悔。该方法在线学习一个函数逼近器来估计选择特定动作的后悔。无遗憾算法使用这些估计来代替真正的遗憾来定义一系列策略。通过给出函数逼近的质量和算法的误差之间的界限，证明了该方法的正确性。一个推论是，只要遗憾最终由函数逼近器实现，该方法就保证收敛于自我博弈的纳什均衡。我们的技术可以理解为对大型游戏中现有抽象工作的原则性概括;在我们的工作中，抽象和平衡都是在自我游戏中学习的。我们通过经验证明，在给定相同资源的情况下，该方法比最先进的抽象技术实现了更高质量的策略。

引用次数: 56

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

AAAI Workshop: Computer Poker and Imperfect Information

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀