在随机博弈中预测被忽视的对手

Shadi Tasdighi Kalat, Sriram Sankaranarayanan, Ashutosh Trivedi
{"title":"在随机博弈中预测被忽视的对手","authors":"Shadi Tasdighi Kalat, Sriram Sankaranarayanan, Ashutosh Trivedi","doi":"arxiv-2409.11671","DOIUrl":null,"url":null,"abstract":"We present an approach for systematically anticipating the actions and\npolicies employed by \\emph{oblivious} environments in concurrent stochastic\ngames, while maximizing a reward function. Our main contribution lies in the\nsynthesis of a finite \\emph{information state machine} whose alphabet ranges\nover the actions of the environment. Each state of the automaton is mapped to a\nbelief state about the policy used by the environment. We introduce a notion of\nconsistency that guarantees that the belief states tracked by our automaton\nstays within a fixed distance of the precise belief state obtained by knowledge\nof the full history. We provide methods for checking consistency of an\nautomaton and a synthesis approach which upon successful termination yields\nsuch a machine. We show how the information state machine yields an MDP that\nserves as the starting point for computing optimal policies for maximizing a\nreward function defined over plays. We present an experimental evaluation over\nbenchmark examples including human activity data for tasks such as cataract\nsurgery and furniture assembly, wherein our approach successfully anticipates\nthe policies and actions of the environment in order to maximize the reward.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Anticipating Oblivious Opponents in Stochastic Games\",\"authors\":\"Shadi Tasdighi Kalat, Sriram Sankaranarayanan, Ashutosh Trivedi\",\"doi\":\"arxiv-2409.11671\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an approach for systematically anticipating the actions and\\npolicies employed by \\\\emph{oblivious} environments in concurrent stochastic\\ngames, while maximizing a reward function. Our main contribution lies in the\\nsynthesis of a finite \\\\emph{information state machine} whose alphabet ranges\\nover the actions of the environment. Each state of the automaton is mapped to a\\nbelief state about the policy used by the environment. We introduce a notion of\\nconsistency that guarantees that the belief states tracked by our automaton\\nstays within a fixed distance of the precise belief state obtained by knowledge\\nof the full history. We provide methods for checking consistency of an\\nautomaton and a synthesis approach which upon successful termination yields\\nsuch a machine. We show how the information state machine yields an MDP that\\nserves as the starting point for computing optimal policies for maximizing a\\nreward function defined over plays. We present an experimental evaluation over\\nbenchmark examples including human activity data for tasks such as cataract\\nsurgery and furniture assembly, wherein our approach successfully anticipates\\nthe policies and actions of the environment in order to maximize the reward.\",\"PeriodicalId\":501175,\"journal\":{\"name\":\"arXiv - EE - Systems and Control\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Systems and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11671\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11671","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们提出了一种方法,用于系统地预测并发随机游戏中的(emph{oblivious})环境所采用的行动和策略,同时最大化奖励函数。我们的主要贡献在于合成了一个有限的emph{信息状态机},它的字母表涵盖了环境的行动。自动机的每个状态都被映射为关于环境所使用策略的信念状态。我们引入了一个一致性概念,它能保证我们的自动机所跟踪的信念状态与通过了解完整历史所获得的精确信念状态保持在一个固定的距离之内。我们提供了检查自动机一致性的方法,以及在成功终止后产生这样一台机器的合成方法。我们展示了信息状态机如何产生一个 MDP,作为计算最优策略的起点,以最大化在游戏中定义的向度函数。我们通过白内障手术和家具组装等任务的人类活动数据等基准实例进行了实验评估,结果表明,我们的方法成功地预测了环境的策略和行动,从而实现了回报的最大化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Anticipating Oblivious Opponents in Stochastic Games
We present an approach for systematically anticipating the actions and policies employed by \emph{oblivious} environments in concurrent stochastic games, while maximizing a reward function. Our main contribution lies in the synthesis of a finite \emph{information state machine} whose alphabet ranges over the actions of the environment. Each state of the automaton is mapped to a belief state about the policy used by the environment. We introduce a notion of consistency that guarantees that the belief states tracked by our automaton stays within a fixed distance of the precise belief state obtained by knowledge of the full history. We provide methods for checking consistency of an automaton and a synthesis approach which upon successful termination yields such a machine. We show how the information state machine yields an MDP that serves as the starting point for computing optimal policies for maximizing a reward function defined over plays. We present an experimental evaluation over benchmark examples including human activity data for tasks such as cataract surgery and furniture assembly, wherein our approach successfully anticipates the policies and actions of the environment in order to maximize the reward.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Data-Efficient Quadratic Q-Learning Using LMIs On the Stability of Consensus Control under Rotational Ambiguities System-Level Efficient Performance of EMLA-Driven Heavy-Duty Manipulators via Bilevel Optimization Framework with a Leader--Follower Scenario ReLU Surrogates in Mixed-Integer MPC for Irrigation Scheduling Model-Free Generic Robust Control for Servo-Driven Actuation Mechanisms with Experimental Verification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1