多重追逃马尔可夫博弈的分散学习

S. Givigi, H. Schwartz
{"title":"多重追逃马尔可夫博弈的分散学习","authors":"S. Givigi, H. Schwartz","doi":"10.1109/MED.2011.5983135","DOIUrl":null,"url":null,"abstract":"We represent the multiple pursuers and evaders game as a Markov game and each player as a decentralized unit that has to work independently in order to complete a task. Most proposed solutions for this distributed multiagent decision problem require some sort of central coordination. In this paper, we intend to model each player as a learning automata (LA) and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that using the proposed learning process, the players' policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.","PeriodicalId":146203,"journal":{"name":"2011 19th Mediterranean Conference on Control & Automation (MED)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Decentralized learning in multiple pursuer-evader Markov games\",\"authors\":\"S. Givigi, H. Schwartz\",\"doi\":\"10.1109/MED.2011.5983135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We represent the multiple pursuers and evaders game as a Markov game and each player as a decentralized unit that has to work independently in order to complete a task. Most proposed solutions for this distributed multiagent decision problem require some sort of central coordination. In this paper, we intend to model each player as a learning automata (LA) and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that using the proposed learning process, the players' policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.\",\"PeriodicalId\":146203,\"journal\":{\"name\":\"2011 19th Mediterranean Conference on Control & Automation (MED)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 19th Mediterranean Conference on Control & Automation (MED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MED.2011.5983135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 19th Mediterranean Conference on Control & Automation (MED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MED.2011.5983135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

我们将多个追求者和逃避者的博弈表示为马尔可夫博弈,每个参与者都是一个分散的单元,必须独立工作才能完成任务。针对这种分布式多智能体决策问题提出的大多数解决方案都需要某种形式的中心协调。在本文中,我们打算将每个参与者建模为一个学习自动机(LA),并让他们进化和适应,以解决他们手头的难题。我们还将展示,使用所提出的学习过程,参与者的策略将收敛到一个平衡点。为了证明该方法的可行性,给出了具有多个跟踪器和多个逃避器的仿真。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Decentralized learning in multiple pursuer-evader Markov games
We represent the multiple pursuers and evaders game as a Markov game and each player as a decentralized unit that has to work independently in order to complete a task. Most proposed solutions for this distributed multiagent decision problem require some sort of central coordination. In this paper, we intend to model each player as a learning automata (LA) and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that using the proposed learning process, the players' policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On use of Petri-nets for diagnosing nonpermanent failures Adaptive backstepping and θ-D based controllers for a tilt-rotor aircraft Optimal control synthesis with prescribed closed loop poles Morse theory and formation control Nonlinear Control of Large Scale complex Systems using Convex Control Design tools
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1